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DICES! 


In  April  1965,  a  list-structured,  real  time  chemical  information  system  was 
demonstrated  on  a  miniature  file  of  1800  compounds.  It  was  known  at  that  time 
that  (1)  a  considerably  more  comprehensive  set  of  chemical  screens  (or  keys) 
would  be  required  to  effectively  partition  a  large  scale  chemical  file  for  a 
list -structured  information  retrieval  system,  and  (2)  some  modification  in  the 
list-structuring  techniques  would  be  required  in  order  to  efficiently  process 
such  large  files  .  CIDS  No.  3  Comprehensive  Summary  Report.  A  Proposed  Chemical 
Information  and  Data  System.  December  1965,  reported  on  the  software  implementa¬ 
tion  or  this  demonstration  system. 

The  above  stated  requirements  have  been  further  developed  and  both  a  com¬ 
prehensive  chemical  screen  system  and  a  more  efficient  real  time  and  batched 
automated  processor  for  large  scale  files  can  now  be  reported.  The  former  is 
described  in  CIDS  No,  4  An  Experimental  Chemical  Information  and  Data  System, 
Status  Report,  and  the  latter  is  documented  in  this  CIDS  No.  5  Report. 

The  basic  difference  in  file  organization  concept  between  the  system  de¬ 
scribed  in  CIDS  No.  3  and  that  described  in  this  report  is  that  the  former  was 
a  variant  of  threaded  list  structures  called  the  Cellular  Multilist,  whereas 
the  present  system  uses  inverted  lists.  The  principal  reason  for  making  this 
change  is  that  the  exceedingly  long  list  lengths*  that  are  produced  by  the 
screening  system  of  CIDS  No.  4,  when  used  to  process  queries  that  contain 
Boolean  combinations  (conjunctions,  disjunctions  and  negations),  must  in  the 
foreseeable  future  be  stored  as  inverted  lists  on  mass  randan  access  memories. 

The  system  has  been  subjected  to  user  oriented  tests  comprising  273 
questions  on  a  file  of  290,000  compounds.  The  list  lengths  for  this  file 
ranged  from  1  to  343,473.  Results  of  these  tests,  including  statistics  on  the 
cost  of  assigning  the  screens,  generating  the  list -structured  search  files  and 
searching  the  files  with  the  batched  processing  system,  are  presented  in  the 
document  entitled  Report  to  the  AMC  User  Advisory  Group  on  the  Initial  Test  of 
an  Experimental  CIDS.  2  October  1967. 

In  addition  to  a  considerably  more  comprehensive  chemical  screen  assign¬ 
ment  and  more  efficient  list  search  implementation,  this  report  contains  the 
complete  documentation  of  the  CHEMTYPE  system,  which  converts  structural  for¬ 
mulas  that  have  been  typed  on  a  chemical  typewriter  to  connection  tables,  and 
of  the  CIDS  isomer  sort  registry  system.  Thus,  the  system  described  in  this 
report  processes  chemical  structures  plus  auxiliary  data  in  the  following  major 
steps:  (1)  Editt  1  hard  copy  of  the  structures  and  data  are  typed  on  a  chemical 

typewriter;  (2)  tut  CHEMTYPE  system  generates  connection  tables  and  formats  the 
non-structural  data;  (3)  the  structure  records  are  registered  via  the  isomer 
sort  registry  system;  (2a,  3a)  alternatively,  the  system  can  accept  connection 
tables  from  tht  CAS  registry  system;  (4)  structural  screens  are  automatically 
assigned;  (5)  the  list -structured  search  file  is  generated  for  either  the  real 
time  or  batched  system;  (6)  the  files  are  searched  by  the  real  time,  on-line 


*  A  list  is  created  for  ev >ry  screen,  which  list  has  all  compounds  that  con¬ 
tain  or  are  described  by  the  given  screen. 
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system  or  by  the  batched  system.  The  output  of  the  search  system  is  the  struc- 
'  *  rural  formula  plus  all  of  the  associated  data.  At  present  the  formula  is 

printed  on  a  Dura  Hach  chemical  typewriter  or  a  Data  Products  line  printer. 
Short ly,  display  on  a  cathode  ray  tube  will  be  implemented. 
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COMPUTER  PROGRAMMING  FOR  AN 


EXPERIMENTAL  CHEMICAL  INFORMATION  AND  DATA  SYSTEM 


1 .  INTRODUCTION 

This  report  describes  all  programs  written  to  date  for  the  experimental 
U.  S.  Army  Chemical  Information  and  Data  System.  They  are  interim  programs 
in  the  sense  that  they  will  be  augmented  and  refined,  in  accord  with  present 
plans  and  future  experience,  to  meet  more  fully  the  requirements  of  an 
operational  system.  Disclosure  at  this  interim  stage  is  designed  primarily 
to  acquaint  computer  systems  analysts  with  (a)  the  basic  design  principles 
of  the  system,  and  (b)  sufficient  programming  and  logic  detail  to  indicate 
how  these  principles  have  been  implemented  on  the  IBM  7040  computer, 

Tire  descriptions  are  presented  at  three  levels.  The  Introduction  pre¬ 
sents  an  overview  of  the  entire  system,  including  the  relationships  among 
the  major  subsystems,  the  generation  and  flow  of  data  within  and  through 
the  system,  and  the  structure  and  content  of  the  principal  hardcore  data 
record.  The  subsequent  two  sections  of  the  report  are  organised  according 
to  the  two  major  systems,  which  are  the  file  generation  and  the  search 
systems.  The  latter  is  composed  of  a  batched  search  system  and  a  real  time 
search  system.  Within  these  respective  sections,  the  total  system  is 
described  and  then  the  individual  programs  are  functionally  and  operationally 
described . 

The  functional  program  descriptions  relate  only  to  how  a  given  program 
functions  within  its  own  executive  environment  and  not  its  relation  to  the 
system  as  a  whole.  That  is,  it  is  a  detailed  description  of  the  specific 
task  that  a  given  program  is  to  perform  and  is  intended  for  an  analyst  who 
might  wish  to  see  the  design  outline  of  each  individual  program  below  the 
level  of  a  total  system  functional  description.  Therefore,  such  a  descrip¬ 
tion  will  include  a  brief  statement  of  program  function  called  the  Abstract 
and  a  somewhat  more  detailed  program  description  which  includes,  where 
appropriate,  a  block  diagram,  buffers,  lists,  record  format,  input  and  out¬ 
put  arguments  and  related  sub  or  main  programs. 

The  operational  descriptions  similarly  relate  to  individual  programs 
and  include,  where  required,  program  operating  instructions  such  as  tape 
requirements,  interpretation  of  error  messages  and  restart  instructions. 

No  microflow  charts  or  listings  are  provided  in  this  documentation.  All 
programs  are  written  in  the  MAP  language  for  the  IBM  7040. 

As  an  aid  to  the  understanding  of  the  relationship  between  the  pro¬ 
grams  described  in  this  report,  a  system  flow  chart  of  the  complete  computer 
system  is  presented  in  Appendix  A.  The  code  names  of  the  programs  required 
to  perform  each  stage  of  processing  are  included.  In  Appendix  B,  abstracts 
for  each  program  described  in  this  report  are  presented  in  alphabetical 
order  by  code  name.  Important  data  formats  appear  in  Appendix  D  as  well 
as  in  appropriate  places  in  the  text.  Unless  otherwise  specified,  all  data 
elements  are  stored  in  binary. 

This  report  does  not  contain  a  system  functional  description  in  the 
instructional  sense  for  potential  users  of  the  system.  Four  existing  reports 


and  CIDS  No.  6  will  collectively  provide  a  user-oriented  system  functional 
description.  The  four  existing  reports  are  those  identified  by  the  numbers 
(1),  (4),  (5),  and  (6)  in  the  literature  citations  on  page  233. 

The  CIDS  No.  4  report  (1)  describes  the  system  of  structural  keys  that 
are  automatically  assigned  to  the  compounds  and  which  serves  as  the  algor¬ 
ithmic  basis  of  the  screen  assignment  programs  described  in  Section  2.4.  The 
Guide  to  the  CIDS  Retrieval  Language  (4)  specifies  the  mode  of  querying  the 
system  both  in  the  batched  and  real  time  systems.  The  ACT  II  and  III  Chemical 
Typing  Conventions  (5)  and  (6)  specify  the  rules  for  editing  and  drawing  struc¬ 
tures  for  system  input  and  the  rules  for  typing  them  along  with  other  related 
data  such  as  the  molecular  formula  and  nomenclature. 

Another  document,  the  Report  of  the  AMC  User  Advisory  Group  on  the 
Initial  Test  of  an  Experimental  CIDS  (2),  is  of  correlative  interest.  It 
presents  the  detailed  results  of  a  large  scale  experiment  in  which  180 
structural  questions  were  submitted  to  this  system  at  a  time  when  the  file 
size  was  290,000  compounds. 

Consonant  with  the  purpose  of  producing  an  experimental  system,  the 
programs  described  in  this  report  continue  to  be  modified  and  improved.  Those 
most  subject  to  change  are  the  screen  assignment  programs  since  they  relate 
directly  to  the  experiments,  and  since  the  over-all  performance  of  the  system 
is  most  sensitive  to  the  quality  and  balance  of  these  screens.  Other  parts 
of  the  system,  such  as  the  list  structuring  programs  have  performed  well  and 
are  more  stable,  although  it  is  planned  to  increase  their  efficiency  some¬ 
what  in  order  to  better  accommodate  massive  files.  A  few  additional  pro¬ 
grams,  which  were  initially  recognized  as  necessary  for  a  maximally  effec¬ 
tive  system  but  whose  development  has  been  intentionally  deferred  until 
results  of  large  scale  experimentation  were  available,  will  be  incorporated. 

Fig.  1  presents  the  three  systems  that  comprise  the  U.  S,  Army  CIDS  and 
defines  their  interrelationship.  These  systems  are  labelled:  (A)  File 
Construction  (B)  Batched  Search,  and  (C)  Real  Time  Search. 

In  System  A  the  structural  formula  of  a  compound  must  be  represented 
as  a  connection  table  before  it  can  be  screened  and  added  to  the  file.  The 
CIDS  file  construction  programs  accept  this  data  from  two  sources,  the 
Chemical  Abstracts  Service  registry  system  and  the  University  of  Pennsylvania 
CdEfiTYPE  system.*  The  connection  tables  from  either  of  these  sources  are 
formatted  into  a  record  along  with  other  data,  and  the  structural  screens** 
are  automatically  assigned.  Then,  based  upon  the  assigned  keys  (screens), 
an  inverted  list  is  generated  in  which  all  compound  record  addresses 
having  a  given  key  are  listed  in  sequence.  The  outputs  of  this  file 
generation  program  are  the  inverted  key  index  and  the  search  file.  The 
program  is  capable  of  producing  such  a  list-structured  file  for  the  batch 
processing  system  (B)  and  the  real  time  system  (C).  In  the  batch  system, 


*The  CHEMTYPE  system  was  developed  at  the  University  of  Pennsylvania  under 
contract  NSF  C-467.  Input,  output  and  chemical  verification  programs 
required  to  process  CIDS  compounds  were  produced  under  the  University  of 
Pennsylvania  Project  CIDS,  Contract  DA-18-035-AMC-288  (A). 

**See  CIDS  No.  4,  Section  3,  for  a  description  of  these  screens. 
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the  index  is  stored  on  magneti  disk  for  more  efficient  processing  of  the 
inverted  lists  and  the  file  is  stored  serially  on  magnetic  tape;  in  the 
real  time  system,  both  the  index  and  the  file  are  stored  on  magnetic 
disk. 


The  batched  pro  essing  system,  represented  in  Fig.  1  as  (dashed)  block 
B,  accepts  queries  irom  punched  cards  in  batches  of  up  to  2000  queries  and 
processes  them  in  two  stages.  The  query  formats  are  described  in  Reference 
4.  The  first  processing  stage  performs  list  intersections,  merges,  or 
deletions  as  required  by  the  logical  expression  of  keys  in  the  query.  The 
number  of  compound  record  addresses  that  respond  to  this  stage  are  printed 
on  the  line  printer  as  a  retrieval  statistic. 

The  second  stage  of  the  process  utilizes  the  address  records  produced 
by  the  first  stage  and  performs  all  necessary  subsequent  processing  on  the 
accessed  records.  This  may  include  molecular  formula  qualification  and 
structural  atom-by- atom  search  The  responses  from  stage  two  are  sorted 
by  query  ID  number  and  printed  on  the  high  speed  line  printer. 

In  the  reai  time  system,  shown  in  (dashed)  block  C,  both  the  index 
and  file  are  on  random  access  disk.  The  organization  of  this  system  is 
similar  in  its  two  stage  operation  to  the  batched  system,  differing  in  the 
following  respects:  (1)  The  inputs  and  outputs  are  via  teletype  and 
teletype/Dura  Mach  chemical  typewriter,  respectively.  These  are  connected 
on-line  to  the  system  via  data  sets.  (2)  The  queries  are  processed  one 
at  a  time  as  soon  as  they  are  received.  (3)  The  retrieval  statistics 
are  returned  immediately  to  the  on-line  typewriter  as  soon  as  they  are 
computed  in  stage  one.  (4)  The  compound  records  are  retrieved  randomly 
from  the  disk  and  returned  immediately  to  the  teletype,  where  they  are 
punched  on  paper  tape  and  can  be  printed,  with  structural  formula,  on  the 
Dura  Mach  Chemical  typewriter. 


1.1'  THE  CIDS  RECORD  AND  DATA  STRUCTURES 

The  data  fields  of  the  CIDS  records  are  listed  below; 
Registry  Number 

Additional  Compound  Identification  Numbers 
Molecular  Formula 

Connection  '.able  and  Abnormality  Table 
*  Structural  Formula  Image 
Structural  Keys 
Reference  Block: 

Nomenclature 
Descriptors 
Security  Indicator 
Stereo  Indicator 


* 

* 

* 


The  fields  marked  by  an  asterisk  appear  in  the  records  processed  by  the  CHEM- 
TYPE  system,  but  not  in  the  CAS  record.  Separate  tape  files  of  the  CAS  system 
contain  nomenclature  and  bibliographic  references, 

Three  basic  kinds  of  data  or  information  structures  are  represented  in 
this  record.  These  are:  (1)  standard  alphanumeric,  (2)  graphs,  (3)  pictorial 
displays.  The  graphs  are  represented  by  a  connection  table  which  cites  the 
nodes  and  branches  of  the  graph  along  with  their  values,  such  as  C,0,S,  etc, 
and  single,  double,  triple  bonds.  Abnormalities  related  to  specific  nodes  of 
the  graph,  such  as  charge,  mass,  valence,  are  cited  in  a  correlated  table 
called  the  abnormality  table.  The  format  of  these  tables  is  described  in 
Sections  2. 1.2.2  and  3. 1.6.2. 

The  pictorial  display  data  contains  the  compound  structural  formula  and 
is  represented  in  Che  . ecord  as  a  table  which  contains  every  typed  symbol  (from 
the  chemical  typewriter)  of  the  structure  along  with  a  number  which  gives  its 
relative  location  within  a  display  matrix.  In  the  future,  it  is  intended 
to  replace  this  memory  consuming  representation  with  a  more  concise  one  which 
stores  only  node  coordinates,  whereby  all  bond  symbols  can  be  algorithmically 
reconstructed  via  Cartesian  geometry  plus  a  few  heuristics.  This  part  of  the 
record  is  called  the  structural  formula  image. 

The  remainder  of  the  record  is  standard  alphanumeric  data,  although  the 
nomenclature  requires  an  extended  symbol  set  because  it  contains  Greek  let¬ 
ters,  upper  and  lower  case  letters,  plus  other  special  characters. 

The  representation  of  molecular  formula,  structural  formula  and  nomen¬ 
clature,  therefore,  requires  a  considerably  expanded  printer  and  display 
font  rapability,  and  both  the  input  devices  (Dura  Mach  and  Mergenthaler  chemi¬ 
cal  typewriters)  and  the  output  devices  (Line  printer,  Dura  Mach  and  CRT)  are 
designed  to  meet  these  specifications,  although  none  of  them  has  compatible 
code  sets.  The  character  sets  and  binary  code  assignments  of  the  Data  Products 
line  printer  and  the  Dura  Mach  and  Mergenthaler  chemical  typewriters  are  pre¬ 
sented  In  Appendix  F. 


2. 


FILE  CONSTRUCTION 


This  section  describes  in  more  detail  the  preparation  of  data  for  use  by 
the  CIDS  Retrieval  System.  Dashed  block  A  in  Figure  1  gives  a  simplified  view 
of  the  processing  required  for  data  from  each  of  the  two  accepted  sources, 
the  CAS  Registry  System  and  the  University  of  Pennsylvania  CHEMTYPE  system. 

Each  of  the  following  subsections  describes  one  of  the  major  phases  of  pro¬ 
cessing  in  the  construction  of  the  Search  File.  These  are  (1)  CAS  Conversion, 
(2)  Chemical  Typewriter  Input,  (3)  Registration,  (4)  Key  Assignment,  and  (5) 
List-Structured  File  Construction. 

Figure  2  gives  an  overall  view  of  the  processing  of  data  received  from  the 
GAS  Registry  System.  The  major  processing  phases  required  are  thoae  numbered 
(1),  (4)  and  (5)  above.  The  process  blocks  in  the  flow  chart  are  numbered  in 
this  same  way,  and  contain  the  code  names  of  the  programs  required  to  perform  th 
processing. 

Figure  3  gives  an  overall  view  of  the  processing  required  for  data  entered 
through  a  chemical  typewriter.  Process  blocks  numbered  (2),  (3),  (4)  and  (5) 
refer  to  the  major  processing  phases  listed  above.  The  blocks  include  the  code 
names  of  the  programs  required  in  each  phase. 


2.1  O.A.S.  CONVERSION 

This  section  describes  the  programs  required  to  prepare  data  received  from 
the  CAS  Registry  System  for  use  by  CIDS.  The  first  two  programs,  CASFME  and 
CONVRT,  translate  data  from  the  CAS  Structure  Master  File  to  the  CIDS  format. 

The  next  two  programs  described,  ADDMF  and  MOLEF,  perform  a  conversion  of  molec¬ 
ular  formula  data  found  in  the  CAS  Bibliography  File  and  adds  this  to  the  corres 
ponding  compound  record.  The  output  of  this  phase  of  processing  is  a  compound 
tape  in  the  CIDS  record  format  suitable  for  input  to  the  Key  Assignment  System. 
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Figure  2.  File  Construction  From  CAS  Dote 


2.1.1  CAS  Structure  Conversion 


Code  Name:  CASFMX 
Programmer:  James  Gerber 

Abstract:  CASFMT  reads  the  CAS  Structure  Master  File  and  translates  the 
information  to  the  CIDS  format.  The  output  of  CASFWC  is  a  tape  containing  the 
registry  number,  connection  table,  and  abnormality  table  (if  present)  for  each 
CAS  compound  converted. 


2. 1.1.1  Program  Description 

CASMFT  accepts  CAS  Structure  Master  tapes  as  input.  The  format  of  these 
tapes  can  be  found  in  Appendix  C.  These  tapes  contain  a  series  of  compound  de¬ 
scriptions  made  up  of  one  to  four  records  of  types  FI,  F2,  F3  and  F4.  The  program 
first  reads  a  type  FI  record  and  makes  a  table  of  "From-Attachraeats . "  Then  a  type 
F2  record  is  read  to  obtain  a  list  of  element  symbols.  An  F3  record  is  read  to 
obtain  a  table  of  bond  types.  The  registry  number  and  subsidiary  information  are 
read  from  an  F4  record  and  converted  for  use  as  textual  descriptions.  After  an  F4 
record  has  been  read,  a  CIDS  connection  table  is  produced  by  program  CONVRT. 

The  CAS  File  is  ordered  to  take  advantage  of  the  fact  that  many  compounds 
have  identical  first  records,  first  two  records  or  even  first  three  records. 

These  identical  records  are  not  repeated,  so  that  after  reading  and  converting 
a  type  F4  record  (which  must  be  the  last  record  of  every  description) ,  the  next 
record  read  (beginning  a  new  description)  need  not  necessarily  be  of  type  Fl. 

The  new  description  may  begin  with  a  type  F2,  F3,  or  F4  record,  indicating  that 
the  beginning  records  which  have  been  omitted  are  identical  to  the  corresponding 
type  records  for  the  previous  compound. 

After  a  complete  compound  description  has  been  read,  CASFKL  produces  X,  B  and 
E  tables  as  input  to  program  CONVRT  (Section  2.1.2)'.  It  reads  and  reformats  the 
modification  list  (abnormality  cable),  putting  the  charge,  mass,  and  valence  in¬ 
formation  into  a  CIDS  abnormality  table  and  the  sing le-a tom-add end  information 
into  the  CIDS  connection  table.  Program  CONVRT  then  converts  the  connection  table 
into  CIDS  formal:  and  renumbers  the  abnormality  table  to  agree  with  the  connection 
table  which  has  been  renumbered  by  CONVRT. 

CASFMT  will  produce  CIDS  output  tapes  which  are  used  as  input  to  programs 
ADDMF  and  MOLEF  which  add  molecular  formula  information  and  reformat  the  records. 

A  macro  flow  chart  of  the  program  is  presented  in  Figure  4. 


2. 1.1.2  Program  Structure 

CASFMT  is  a  main  program  which  calls  subroutine  CONVRT.  The  input  consists 
of  the  tapes  holding  the  CAS  Structure  Master  File.  The  output  consists  of  a 
tape  of  compound  descriptions  in  the  following  format: 


« 
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Read  a 
CAS  Record 


Figure  4 .  Macro  Flow  Chart  -  CASFMT 
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Word  Contents 

0  Bice  0~ 1 7 :  Number  of  rings  in  structure  (Binary) 

Bits  18-35:  Number  of  words  in  connection  table  (Binary) 

t 

1-2  CAS  registry  number  (9  characters,  right- justified) 

3-m  Connection  table  (CIDS  format)  (See  Sec.  2. 1.2.2) 

m-f-l-n  Abnormality  table  (if  any- last  word  is  0) 

(See  Section  2.2.11.2) 

Some  compounds  are  allowed  In  the  CAS  system  that  cannot  be  converted  to 
CIDS  format.  These  compounds  are  rejected  and  the  message 

REGISTRY  NO .  DELETED 

Is  printed  on  the  line  printer.  No  action  is  required.  If  Sense  Switch  6  is 
in,  the  connection  table  will  be  printed  before  the  deletion  message. 

2. 1.1.3  Operating  Instructions 

When  running  this  program,  tapes  must  be  mounted  as  follows: 

CAS  Structure  Master  —  S.SU05 

Checkpoint  tape  (no  ring)  —  S.SU25 

Output  tapes  as  specified  on  $FILE  card 

At  the  end  of  an  input  reel,  the  computer  will  halt  after  typing  a  message.  If 
the  reel  is  not  the  last,  press  SS5  in  and  then  start.  This  will  cause  a  check¬ 
point  to  be  taken  as  a  safety  measure.  Then  restart  with  the  new  reel  mounted 
using  the  checkpoint  code  typed  out.  If  the  reel  is  the  last,  leave  SS5  out  and 
press  start.  This  will  cause  the  output  files  to  be  closed  (all  remaining  out¬ 
put  is  written  and  a  file  mark  is  written)  and  the  program  will  exit.  Output 
reel  switching  is  automatic.  When  starting  from  a  checkpoint,  the  same  output 
reels  that  were  mounted  when  the  checkpoint  was  taken  should  be  mounted. 

The  following  sense  switch  settings  alter  the  program: 

SS6:  in-  connection  tables  printed  on  line  printer 
out-  no  connection  table  printing 

SS5:  in-  take  checkpoint  and  terminate  job 
out-  no  action 

When  a  check  point  is  taken,  all  reel  repositioning  information  la  retained. 
Restart  can  be  achieved  either  by  a  $RESTART  card  or  by  an  operator  interrupt 
with  the  restart  code  entered  into  the  console  keys. 
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2.1.2  Structure  Conversion  arid  Compression 


Code  Name :  CONVRT 

Programmer:  John  D.  Leggett 

Abs tract:  The  purpose  of  the  program  is  to  convert  a  structure  to  a 
format  suitable  for  storage  and  searching.  The  structure  is  compressed  to 
facilitate  the  atom-by-atom  search.  To  accomplish  this  compression,  carbon 
atoms  with  exactly  two  direct  attachments  are  removed,  and  the  path  lengths 
and  bonding  are  indicated.  The  program  will  also  format  structures  which  are 
query  fragments,  in  which  case  the  resulting  connection  table  has  redundancy 
removed  and  the  atoms  are  ordered  to  speed  searching.  In  addition,  the  vari¬ 
ous  types  of  free,  or  hanging,  bonds  are  formatted. 

2. 1.2.1  Program  Description 

The  first  process  is  the  addition  of  artificial  atoms  at  the  ends  of  any 
hanging  bonds  in  a  query  fragment.  For  a  hanging  bond,  a  carbon  with  "don't 
care"  number  of  connections  is  added.  For  dashed  bonds  (which  are  entered  as 
type  5  bonds  and  indicate  attachment  to  C  or  H) ,  the  bond  is  deleted  and  a  spe¬ 
cial  indicator  is  placed  on  the  atom. 

The  next  step  is  to  compress  by  removal  of  carbon  atoms  with  exactly  two 
attachments.  The  list  of  connections  is  examined  to  locate  these  atoms  which 
are  then  removed.  Before  removal  of  any  atoms,  each  connection  is  considered 
to  be  of  path  length  1,  As  an  atom  is  removed,  the  atoms  to  which  it  is  con¬ 
nected  are  then  indicated  as  being  attached  to  each  other  by  a  path  length 
which  is  the  sum  of  the  two  path  lengths  incident  on  the  removed  atom.  The 
bonds  corresponding  to  these  paths  are  concatenated  and  placed  with  the  atoms 
to  which  the  removed  atom  is  attached.  When  all  the  carbons  with  two  connect¬ 
ions  have  been  removed,  the  program  renumbers  the  atoms  to  form  a  compact  set 
of  atom  numbers,  and  format*  the  connection  table. 

If  the  structure  is  a  query  fragment,  CONVRT  then  orders  the  atoms  in  the 
connection  table  on  the  basis  of  element  kind  and  atom  connection  complexity, 
such  that  the  most  unusual  or  significant  atom  appears  first.  This  technique 
speeds  atom-by-atom  search,  as  the  query  fails  an  irrelevant  compound  more 
quickly.  Redundant  entries  are  then  removed  from  the  connection  table. 

A  macro  flow  chart  of  CONVRT  is  presented  in  Figure  5. 

2. 1.2.2  Program  Structure 

Program  CONVRT  is  a  subroutine  which  is  utilized  in  many  phases  of  the 
CIPS  system.  The  input  consists  of  the  connection  table  in  the  format  de¬ 
scribed  below,  the  abnormality  table  (if  any),  an  indication  of  whether  the 
structure  is  a  query  fragment,  the  location  in  core  where  the  final  connection 
table  is  to  be  placed,  and  the  number  of  atoms  in  the  structure. 
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Figure  5.  Macro  Flow  Chart  -  CODVrt 
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The  Input  connection  table  consists  of  three  lists:  X,  B,  and  E,  in 
which  each  atom  and  its  connections  are  described  in  eight-word  blocks.  The 
first  eight  words  of  each  array  are  allocated  to  atom  1,  the  next  8  words  to 
atom  2,  etc.  Each  eight  word  block  in  the  X  list  contains  the  atom  numbers 
for  up  to  eight  connections  from  that  atom,  right-adjusted  in  consecutive  words 
The  corresponding  words  in  the  B  list  contain  the  bond  type  of  the  connection, 
right-adjusted.  In  the  E  list,  the  first  word  of  each  group  of  eight  contains 
the  element  kind  for  that  atom,  right-adjusted  in  BCD.  In  addition,  bit  17  is 
set  to  1  for  each  word  of  the  E  list  corresponding  to  an  entry  in  the  X  list. 

If  the  connection  is  a  ring  connection,  the  corresponding  E  word  is  set  minus. 
In  the  example  belcv,  the  X,  B,  E  representation  is  shown  in  octal,  with  lead¬ 
ing  zeros  omitted. 


Kr- 


S-  N2 


X  3 
0 
0 
0 
0 
0 
0 
0 
3 
0 
0 
0 
0 
0 
0 
0 
1 
2 
0 
0 
0 
0 
0 
0 


B  1 
0 
t) 
0 
(I 


l> 

3 

0 

0 

0 

0 

0 

0 

0 

3 

0 

0 

0 

0 

0 

0 


E  1006042  It  1 
0 
0 
0 
0 
0 
1 
\  • 

'•  ).H'04C 
0 
0 
0 
0 
0 
0 
0 

1006023  #3 

1000000 

0 

0 

0 

0 

0 

0 
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If  the  connection  is  a  ring  connection,  the  corresponding  E  word  is  set  minus. 


The  output  consists  of  the  connection  table  (C.T.)  stored  in  a  block  of 
consecutive  7040  words.  The  connection  table  is  divided  into  three  parts: 
the  connection  segment,  the  bond  index,  and  the  bond  segment.  In  addition, 
the  first  word  of  the  C.T.  is  an  index  to  the  three  parts.  The  address  (bits 
21-35)  of  the  index  word  contains  the  relative  location  of  the  bond  index  seg¬ 
ment;  the  decrement  (bits  3-17)  contains  the  relative  location  of  the  bond  seg¬ 
ment.  This  is  illustrated  below: 


1 

2 


y 


1 


y 


x- 


1 


X 


OOOOOxOOOOOy 


Index  Word 


Connection  Segment 


} 


Bond  Index 


^  Bond  Table 


Connection  Segment: --In  the  connection  segment,  carbon  atoms  with  exactly 
two  attachments  are  not  explicitly  stored.  The  presence  ot  rliesp  atoms  is 
indicated  in  the  C.T.  as  follows: 

CM, 

1  V.  s*  ’ 

N  —  C  sas  C  —  C  —  C 

vch3 

Atom  number  1  is  connected  to  atom  number  2  by  a  path  of  length  4.  Likewise 
for  a  file  compound  the  redundant  connection  is  indicated*  atom  2  is  con¬ 
nected  to  atom  1  by  a  path  of  length  4.  Each  atom  present  in  the  C.T.  is 
stored  as  follows: 

1st  word: 


Bits 


Contents 


s 

1 


2-5 

6 

7-11 


0 

=  1  if  atom  is  in  a  ring 
=  0  otherwise 

No.  of  connections  to  this  atom 

=  1  if  1st  connection  is  part  of  j  ri.g 
=  0  otherwise 

Path  length  to  ist  connect!- a 
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12-17 

18-29 

30-35 


Atom  n-.».  of  1st  connection 
Element  kind  in  BCD,  right-justified 
Node  type  (see  below) 


2nd  word:  (if  necessary) 


s  1 

1-11  0 

12  \  »  i  if  3rd  connection  is  part  of  a  ring 

|  =  0  otherwise 

13-17  Path  length  to  3i_d  connection 

18-23  Atom  no.  of  3id  connection 

24  j  =  1  if  2nd  connection  is  part  or  a  ring 

I  “  0  otherwise 

25-29  Path  length  to  2nd  connection 

30-35  Atom  no.  of  2nd  connect  Lo;. 

3rd,  4th  words  (If  necessary) 


Same  format  as  2nd  word  for  the  remaining  connections. 

An  atom  is  node  type  1  if  it  is  not  carbon,  node  type  2  if  it  is  a  carbon  with 
more  than  two  attachments,  and  node  type  4  if  it  is  a  carbon  with  one  ccnnec- 
tion  (i.e.  a  branch  end).  If  a  compound  contains  only  carbons  with  2  connec¬ 
tions  (ex.  benzene),  one  atom  is  chosen  as  node  type  3,  and  the  rest  are  com¬ 
pressed  . 

Bond  Index:  The  bond  index  serves  the  purpose  of  lo- sting  entries  in  the 
bond  table  corresponding  to  tach  atom  in  the  connection  segment.  Each  entry  in 
the  bond  inaax  requires  6  bits.  The  rigrtmost  6  bits  of  the  first  bond  index 
word  gives  the  location,  relative  to  the  head  of  the  bond  table,  of  the  start 
of  the  bond  entries  of  the  second  atom  (the  entries  for  the  first  atom  are  to 
begin  with  the  first  word  of  the  bond  table).  The  format  is: 

Word  1: 


Word  2 


Bits 

Relative  Locat: 

30-35 

Atom 

2 

24-29 

Atom 

3 

18-23 

Atom 

4 

12-17 

Atom 

5 

6-11 

Atom 

6 

s-5 

Atom 

7 

(if  necessary) 

30-35 

Atom 

8 

24-29 

Atom 

9 

•» 
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The  table  continues  for  as  many  words  as  necessary  to  provide  an  entry  for 
each  atom.  The  last  entry  gives  the  relative  location  of  the  word  following 
the  last  word  of  the  bond  table. 

Bond  Table:  The  bond  table  consists  of  a  number  of  groups  (one  group 
for  each  atom)  of  bond  entries.  The  location  of  the  beginning  of  each  group  is 
specified  by  the  bond  index  table.  Each  word  of  a  given  group  represents  the 
bonds  in  a  path  from  the  given  atom  to  another  atom,  in  the  form  of  a  string 
of  three-bit  digits,  each  of  which  represents  the  bond  type  of  one  segment  of 
the  path.  The  rightmost  six  bits  of  each  word  contains  the  number  of  the 
atom  to  which  the  string  is  connected.  For  a  path  of  length  greater  than 
10,  the  bond  string  is  continued  in  the  next  word  where  bits  30-35  are  set 
zero.  The  compound  below: 


N5C-C 


-CH. 


'CHi 


has  the  bond  table: 

000000031202  #  1 

000000021301  # 2 

000000000103 
000000000104 
000000000102  #3 

000000000102  4 

A  bond  of  type  4  indicates  a  non-fixed  bond,  i.e.,  a  "resonant" bond . 
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As  an  example,  the  octal  representation  o £  the  connection  table  as  for¬ 
matted  by  CONVICT  for  the  following  compound  is  shown  below: 


000016000014 

234601602302 

1024601 

030101602302 

1040103 

010102604601 

020102604601 

305 

030304604501 

1070106 

010105604601 

010105604601 

151411070603 

16 

44444401 

44444401 

102 

101 

203 

104 

202 

102 

12105 

12104 

206 

207 

205 

205 


Index  Word 


Connection  Segment 


Bond  Index 


Bond  Table 
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'2,1.3  Addition  of  Molecular  Formula 


Code  Name:  ADDMF 

Programmer:  Huth  V.  Powers 

Abatv act ;  ADDMF  reads  a  tape  of  compound  connection  tablet  which  have 

been  translated  from  CAS  to  CIDS  format  and  are  ordered  by  CAS  Registry  Numbers. 
Molecular  formula  dote  from  the  CAS  Bibliography  File  is  added  to  this  tape  and 
the  compound  records  are  rewritten  in  CIDS  record  format. 

2. 1.3.1  Program  Description 

ADDHF  reads  a  tape  containing  compound  connection  table  (C.T.)  records 
which  have  been  converted  to  CIDS  connection  table  format  from  the  CAS  Structure 
Master  File.  ADDMF  calls  subroutine  MOLEF  (Section  2.1.4)  to  read  molecular 
Formula  data  from  the  CAS  Bibliography  File. 

As  each  compound  C.T.  record  la  read,  the  CAS  Registry  Number  (R.N.)  is 
given  to  subroutine  MOLEF  which  reads  the  bibliography  tape  until  the  molecular 
formula  record  for  that  compound  has  been  found  or  until  a  larger  R.N.  is  read, 
indicating  that  che  desired  record  is  not  on  the  tape.  When  the  desired  mole¬ 
cular  formula  record  is  found  a  Hill  and  addend  formula  is  formed  in  the  CIDS  fot~ 
mat  by  MOLEF.  Compounds  for  which  no  M.F.  can  be  found  are  rejected  with  error 
messages. 

ADDMF  stores  the  R.N. ,  C.T. ,  and  newly  formatted  M.F.  in  the  CIDS  compound 
record  format.  Also  stored  at  this  time  is  a  count  of  the  total  number  of  rings 
in  the  compound  which  btained  from  the  output  of  GASFMT.  This  is  stored 

as  the  first  key  in  the  Key  block. 

A  macro  flow  chart  of  the  program  is  presented  in  Figure  6. 

2. 1.3. 2  Program  Structure 

ADDMF  is  a  main  program  which  calls  subroutine  MOLEF.  It  prepares  data  for 
input  to  the  Screen  Assignment  System. 

The  Input  to  ADDMF  consists  of  two  tape  files.  One  is  the  output  of 
CASFMT  (Section  2.1.1)  which  contains  C.T.  records  of  the  following  format: 

Word  Contents 

1  D  •  No.  kings 

A  »  No,  Words  in  C.T. 

2,3  R.N. 

4  C.T. 
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Figure  6.  Macro  Flow  Chat 


Word 


Contents 


m  Abnormalities  (if  any) 

(If  present,  the  last  word  is  zero) 

In  addition,  the  CAS  Bibliography  file  is  an  innut  which  is  read  by  sub¬ 
routine  MOLEF. 

The  output  of  ADDMF  is  an  IOBS  tyoe  2  tape.  Each  comDound  record  is  a 
logical  record.  These  are  grouped  into  physical  records  of  1000  words  or  less. 
The  CIDS  record  format  follows: 


Word 

Bits 

Contents 

1 

(  3  17) 
(21-35) 

2's  C  (if  words  preceding  Addit.  Peg. 
2's  C  (ft  words  in  logical  record) 

No.) 

2 

(  3-17) 
(21-35) 

2's  C  (If  words  preceding  Abnormality 
2's  C  (If  words  preceding  C.T.) 

Table) 

3 

(  3-17) 
(21-35) 

2's  C  (If  words  preceding  References) 
2's  C  (If  words  preceding  S.F.I.) 

4 

(  3-17) 
(21-35) 

2's  C  (it  words  preceding  Keys) 

2's  C  (it  words  preceding  Oualifiers) 

5,6 

Primary  Registry  Number  (BCD) 

7 

Mol  Form 

m 

Additional  Registry  Number 

n 

Structure  (C.T.) 

0 

Abnormality  Table  (if  any) 

r> 

Structural  Formula  Image  (if  any) 

<! 

Reference  (if  any) 

Dualifiers  (if  any) 


s  Keys  (  2  words  per  kev) 

Note  that  several  of  the  data  blocks  will  be  empty  (or  have  zero  word  length). 
The  pointers  to  these  blocks  will  point  to  the  location  where  the  data  would  be 
stored  if  present. 
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If  the  M.F.  record  for  a  particular  R.N.  cannot  be  found,  the  following  error 
message  is  typed:  "NO  MF  FOR"  where  the  CAS  R.N.  is  given. 

2. 1.3. 3  Operator  Instructions 

The  C.T.  input  tapes  (output  of  CASFMT)  are  loaded  sequentially  on  S.SU06. 
When  the  end  of  each  C.T.  input  tape  is  read,  a  message  is  printed  requesting 
that  sense  switch  3  be  pressed  in  if  this  is  the  last  input  tape.  The  loading 
the  CAS  Bibliography  tapes  and  other  sense  switch  settings  are  described  In 
Section  2. 1.4. 3. 


1.1.4  Molecular  Formula  Extraction  Program 


Code  Name:  MOLEF 

Programmer :  Paul  R.  Weinberg 

Abstract:  Subroutine  MOLEF  consists  of  a  package  of  programs  that  locate 

and  extract  the  file  record  corresponding  to  a  given  registry  number  from  the 
CAS  Bibliograpty  tapes.  Summation  and  addend  molecular  formulas  are  computed 
and  returned  in  a  format  appropriate  for  the  CIOS  file. 

2. 1.4.1  Program  Description 

Subroutines  within  MOLEF  allow  addressing  characters  within  the  CAS  tapes 
and  positioning  on  a  character  basis.  A  facility  is  also  included  to  collect 
characters.  These  subroutines  are  used  to  find  and  extract  the  CAS  record  for 
a  given  registry  number.  Routine  HILLFM  is  then  used  to  compute  the  summation 
formula  and  addend  formula.  The  subroutines  within  MOLEF  are: 


(1)  HILLEM 

Purpose:  Form  the  Hill  and  addend  molecular  formulas  from  a 
pre-positioned  CAS  tape. 

Input:  Index  position  in  POINT  of  start  of  formula  in  buffer. 

Output:  Molecular  formula  in  CIDS  format  in  OUFMLA  block. 

The  accumulator  contains  aero  at  exit  if  formula 
is  unacceptable . 


(2)  C0LEC2,  C0LEC3,  C0LEC4 

Purpose :  Get  2,  3,  or  4  characters  respectively  from  the 

current  CAS  record.  Characters  are  returned  right - 
justified  in  the  accumulator.  (Zeros  fill  unused 
positions . ) 


(3)  CONVRT 


Purpose:  Convert  a  character  number  to  index  register  codes. 

Input:  Character  number  in  CHAR 

Output:  Index  register  codes  (to  access  the  character)  stored 

in  INDEX.  These  codes  point  to  the  word  number  and 
the  character  number  within  the  word  of  the  designated 
character . 


(4)  FORWRD 

Purpose:  Positions  i or ward  in  the  tape  buffer  a  given  number 

of  characters.  Repositions  physical  tape  if  necessary. 

Input:  Number  of  characters  to  be  skipped. 

Output:  Length  of  the  current  block  in  CUR.  Current  register 

setting  of  the  current  block  in  POINT.  (Stored  as 
in  INDEX.) 
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(5)  REVERT 

Purpose:  Inverse  of  •  xT.'VRT 
Input :  From  INDEX 

Output:  To  CHAR 

(6)  LOCATE 

Purpose:  Finds  a  registry  number  on  the  tape. 

Input:  Registry  number  in  REG  and  REG  +  1. 

Output:  Positioned  tape  and  location  of  first  character  fol¬ 

lowing  the  registry  number  in  POINT. 

Figure  7  illustrates  the  internal  organization  of  the  molecular  formula 
data  for  each  fragment  or  addend.  The  Hill  molecular  formula  for  the  compound 
is  formed  from  these  tables  using  the  rule: 

(1)  The  coefficient  for  each  fragment  is  multiplied  by  the  smallest 
number  that  will  make  all  fractional  coefficients  integers. 

The  result  becomes  a  new  coefficient. 

(2)  The  number  of  hydrogens  to  be  subtracted  from  each  fragment  is 
calculated  by  multiplying  the  entries  in  SUBTR  by  the  corres¬ 
ponding  coefficient . 

(3)  The  summation  formula  is  calculated  by  multiplying  the  weights 
in  WEIGHT  by  the  appropriate  coefficient  anu  summing  ior  each 
element  over  all  fragments. 

(4)  The  number  of  hydrogens  in  the  summation  formula  is  adjusted 
by  subtracting  the  entries  in  SUBTR. 

A  macro  flow  chart  for  cue  program  is  presented  in  Figure  8 . 

2.1. A. 2  Program  Structure 

MOLEF  is  a  subroutine  which  is  called  with  a  standard  MAP  CALL  statement . 
The  input  to  the  program  is  a  CAS  Registry  Number  in  characters  stored  in  REG 
and  REG  +  1.  Before  calling  MOLEF  the  first  time,  the  calling  program  calls 
subroutine  INITL  to  initialize  MOLEF  (open  files,  set  up  error  recovery  pro¬ 
cedures  ,  etc  . )  . 

The  output  consists  of  the  Hill  and  addend  molecular  formula  for  the  re¬ 
quested  compound.  The  format  of  this  block  is  described  in  Section  2. 2. A. 

The  contents  of  the  accumulator  at  exit  indicate  the  following  conditions: 

AC  »0  means  that  the  registry  number  is  not  in  the  tape.  The 
current  registry  number  is  stored  in  PJSGFD  and  REGFD+i . 

AC  =  1  means  that  the  number  has  been  found  but  the  file  has  been 
rejected  for  some  other  reason. 

AC  =  2  means  that  there  are  no  euv; s 
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(length  20) 


Poirg  of  word! 
containing 
fractional  part* 
for  fragment 
multiplier* 


word  I  A 


word  16 


word  31 
etc 


150  length 


SUBTR 


Number  of  hydrogens 
to  be  subtracted 
when  generating 
a  summotion  mol  form 
( I  entry  /  fragment ) 


Figure  7.  Internal  Organization  of  Molecular  Formula  Data 


Find  Formula 
in  Compound 
Record , 


_ 1 _ 

Error  if  no 

Call  \ 

Formula  Given 

FORVRD  \ 

or  Record 

to  Position  / 

Deleted 

to  Formula  / 

For  each  Fragment :  Store  Flements 
and  Weights  in  Fragment  Tables.  Also 
Fractional  Coefficients  and  No. 
of  Hydrogens  to  be  Subtracted 


Figure  8.  Macro  Flow  Chart  -  MCLj-.F 


Users  of  the  program  ishould  note  that  F0RM10,  F0HM20,  F0KM3O  and  F0HM40 
are  the  names  of  the  decks  containing  MOLEF  and  related  subroutines.  The  user 
is  warned  that  MOLEF  alters  the  system  control  blocks  for  utilities  S.SU05  and 
S.SU07  and  restores  them  at  exit.  If  the  program  fails  and  dees  net  exit  nor¬ 
mally,  the  operating  system  rauBt  be  reloaded. 

The  following  error  messages  may  be  printed  when  the  described  con¬ 
ditions  occur: 

(1)  END  OF  FILE  READ  BY  MOLEF 

MOLEF  has  read  a  file  mark  and  has  returned  assuming  the  reg¬ 
istry  number  does  not  appear  on  the  tape. 

(2)  RECORD  DELETED  FOR  REGISTRY  NUMBER  XXXXXXXXX 

The  file  has  be;n  located  but  the  record  has  been  deleted 
by  CAS,  MOLEF  assumes  the  registry  number  does  not  appear 
on  the  tape. 

(3)  NO  FORMULA  GIVEN  BY  CAS  FOR  NUMBER  XXXXXXXXX 

MOLEF  assumes  the  registry  number  does  not  appear  on  the  tape. 

(4)  SUMMATION  FORMULA  IOR  REGISTRY  NUMBER  XXXXXXXXX 
Deleted  due  to  missing  fraction  coefficent  XXXXXX 
Syntactical  error  on  the  part  of  CAS.  Return  with  1  in  the  AC. 

(5)  STORAGE  ALLOCATION  PROBLEM  FOR  REG.  XXXXXXXXX  DELETING  SUEMA- 
TION  FORMULA 

Not  enough  buffer  spac?  has  been  assigned  to  compute  the  sum¬ 
mation  formula.  Returr.  with  1  in  the  AC. 

(6)  TOO  MANY  FRAGMENTS  IN  REG  NUMBER  XXXXXXXXX  DELETING  SUM¬ 
MATION  FORMULA 

More  than  10  fragments  are  not  allowed.  Return  with  1  in  AC. 

(7)  FORMULA  TOO  COMPLEX  FOR  REG.  NUMBER  XXXXXXXXX  DELETING  SUM¬ 
MATION  FORMULA 

Not  enough  buffer  space.  Return  with  1  in  AC. 

(8)  CONTROL  WORD  OVERFLOW 

More  than  18  elements  have  been  found.  Not  enough  room 
in  format . 

(9)  END  OF  BUFFER  REACHED  AT  NUMBER  XXXXXXXXX 
SUBROUTINE  INDEX2  SKIPPING  TO  NEXT  TAPE  RECORD 
Program  error.  Return  with  0  in  AC. 

(10)  UNABLE  TO  FIND  REGISTRY  XXXXXXXXX 
TAPE  POSITION  TO  XXXXXXXXX 

A  number  higher  than  the  desired  registry  number  has  been  located. 
MOLEF  returns  with  the  number  found  in  REGFD  and  REGFD+1 .  AC 
set  to  0. 
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2. 1.4.3  Operating  Instruction. 

MOLEF  expects  to  find  the  first  CAS  Bibliography  tape  on  unit  S.SU05. 

The  remainder  of  the  tapes  should  be  mounted  alternately  on  S .SU07  and  S.SU05 
to  allow  reel  switching  to  take  place.  Input  is  doubly  buffered  and  reel 
switching  is  automatic. 

If  sense  switch  6  is  pro sm  >1  in,  a  lisiiug  of  the  output  blocks  will  be 
produced . 


2.2  CHEMICAL  TYPEWRITER  INPUT 

The  purpose  of  the  Chemical  Typewriter  Input  Programs  (CHEMTYPE)  is  to  pro¬ 
vide  a  means  of  introducing  chemical  structures  into  registry  files  which  can 
then  be  used  directly  with  structural  key  assignment  to  generate  search  files. 

The  overall  process  consists  of  the  following  steps: 

(1)  Information  about  each  compound  to  be  entered  is  written  by  a 
chemist  in  a  standard  form. 

(2)  A  typist,  using  either  a  Dura  Mach  or  Mergenthaler  Chemical 
Typewriter,  transcribes  the  information  simultaneously  to  typed 
copy  and  to  punched  paper  tape. 

(3)  The  paper  tape  image  is  transcribed  by  a  computer  to  magnet^  ’ 
tape. 

(4)  The  CHEMTYPE  programs  exhaustively  analyze  the  magnetic  tape 
images  and  produce  files  which  are  suitable  as  input  to  a  chemi¬ 
cal  registry  system. 

The  entire  process  of  entering  chemical  data  requires  that  care  be  taken 
by  the  chemist  and  the  typist  to  ensure  accuracy  of  input.  However,  a  primary 
objective  of  the  CHEMTYPE  programs  was  to  permit  the  chemist  and  the  typist  the 
widest  possible  latitude  both  in  entering  data  and  in  making  corrections  that 
are  consistent  with  the  unambiguous  interpretation  of  the  paper  tape  input  stream. 
Thus,  the  CHElfTYPE  programs  recover  a  variety  of  retrievable  errors  and  permit 
wide  variability  of  input  formats.  They  signal  errors  only  when  the  input  con¬ 
tains  an  error  they  cannot  correct. 

The  chemist  provides  the  original  input  data  for  the  typist.  It  ia  his 
job  to  present  the  structures  to  the  typist  in  such  a  way  that  she  may  type 
them  with  no  knowledge  of  chemistry.  The  rules  for  the  chemiat  have  been  pre¬ 
viously  stated  in  ACT  II  Typing  Conventions  and  ACT  III  Typing  Conventions,  and 
are  consistent  with  standard  structuring  conventions.  These  documents  describe 
the  rules  to  be  applied  to  the  Mergenthaler  and  Dura  Mach  typewriters  respec¬ 
tively.  Tl>e  rules  for  the  chemist  are  identical  ^n  both  cases. 

Figure  9  is  an  example  of  the  copy  which  the  chemist  produces. 

The  typls t  must  type  so  that  all  pertinent  information  is  recorded  on 
digital  paper  tape.  The  typewriter  input  is  such  that  control  characters  are 
punched  which  later  are  used  to  determine  the  location  of  each  typed  character 
in  a  two  dimensional  matrix.  The  typist,  therefore,  has  a  certain  amount  of 
freedom  to  move  randomly  within  the  record  since  the  physical  location  of  char¬ 
acters  is  not  determined  by  the  strict  order  in  whicli  they  appear  on  paper  tape. 
The  specific  rules  for  the  typist  to  follow  have  beer  previously  stated  in  ACT 
II  Typing  Conventions^ and  ACT  III  Typing  Conventions  6 . 

The  typist  is  allowed  a  certain  leeway  in  correcting  errors.  Procedures 
are  given  for  correcting  a  specific  error  and  far  deleting  a  record  which  is 
partially  typed  and  starting  over  again. 


C4H10A6Cl3N4S2 


U^HgAsClN  S  °2ClIl 


[Arsine,  (2-chlorovinyl)bi8(guanylmercapto)*^7 

The  dihydrochloride:  GlSHf-eHAs — 93<-tHH)NK^  .  -HCLt 

Ethylenedithioarsouit,  2-chloro-,  diguanyl, 
dihydrochloride 


ci  -c  -C'As 


S-C'tJ 


S-c-^ 

,1 

i\J 


TLfo 


Figure  9*  Copy  Produced  by  Chemist 


Figure  ID  is  the  hard  copy  output  produced  by  the  typist  from  the  input 
copy  shown  in  Figure  9.  It  must  be  noted  that  the  nomenclature  in  both  Figure  9 
and  Figure  W  is  inaccurate.  It  is  given  to  the  typist  as  received  without 
editing  since  the  nomenclature  as  presented  may  have  served  as  an  index  term 
on  many  previous  occasions. 

The  Programmer  has  designed  the  system  to  recognize  all  possible  Inputs 
from  completely  meaningless  information  to  good  chemical  records.  A  great 
deal  of  error  checking  has  been  introduced  into  the  system  so  that  only  correct 
chemical  records  are  entered  into  the  file.  The  only  real  limitation  on  this 
has  been  the  impossibility  of  checking  the  accuracy  of  the  nomenclature  entered 
for  each  compound,  or  the  accuracy  of  the  local  control  number  (TID),  stereo  or 
classification  information.  The  only  way  incorrect  data  of  this  sort  can  be 
kept  out  of  the  file  is  to  visually  check  all  typed  copy  and  note  the  TID  of 
incorrect  records.  These  records  may  then  be  deleted  by  using  the  TXD's  as  in¬ 
put  to  a  subroutine  which  prevents  these  compounds  from  being  entered  into  the 
file  during  processing. 

The  accuracy  of  the  molecular  formula  may  re  checked  within  limits  by  the 
hydrogen  parity  rule  and  by  comparison  with  the  typed  structure.  Any  discrep¬ 
ancy  between  the  molecular  formula  and  the  typed  structure  is  assumed  due  to  an 
error  in  one  or  the  other. 

The  two  kinds  of  typewriters  produce  almost  identical  copy  with  the  excep¬ 
tion  that  the  Dura  Mach  has  a  more  restricted  character  set  than  the  Mergenthaler 
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and  several  of  the  required  symbols  muat  be  simulated.  There  are,  however, 
enough  differences  in  the  manner  in  which  the  two  typewriters  accomplish  this 
that  separate  input  programs  had  to  be  written  specially  tailored  for  the  ape- 
cific  typewriter. 


> 

Too»»7  (U)  CaHioAsClaNeSa 

jCeHeAsC I NaSa'C I H 


aCIH 


STEREO  N 

(Ars  t  ne,  (a-ch lorovlny l)b i s(guany  morcapto) -) 
The  d I  hydrochlorides 

Ethy lenedlthloarsonit,  2-chloro-,  diguany!, 

dlhydroch lori de 

=TLeo 

=1083 

*# 


Figure  10.  Copy  Produced  by  Chemical  Typist 


THE  DURA  MACH 


Since  the  Dura  Mach  does  not  record  any  coordinate  information,  the  analy¬ 
sis  of  structures  entered  through  the  Dura  Mach  must  depend  on  line  and  space 
contr;  1  punches.  As  a  result  the  typist  may  not  move  the  platen  by  hand,  as  this 
does  not  register  on  the  paper  tape.  Similarly,  tab  and  margin  use  is  restricted 
as  !» aerified  in  the  typing  conventions. 

The  limited  symbol  set  on  the  Dura  Mach  requires  the  syn.i osis  of  certain 
symbols.  For  example,  the  triple  bond  is  indicated  by  a  single  bond  overtyped 
by  an  asterisk.  This  is  true  for  all  the  triple  bonds.  The  Dura  Mach  charac¬ 
ters  do  not  include  a  parity  bit  and  no  error  checking  is  done  directly  by  the 
hardware . 

THE  MERGENTHALER 


The  Mergenthaler  typewriter  produces  coordinates  as  a  result  of  a  backspace, 
line  advance,  carriage  return,  tab  or  white  ribbon  or  moving  the  platen  by  hand. 

These  decode  into  the  x  and  y  coordinate  for  the  first  character  which  was  typed 
after  the  last  coordinates  were  produced.  As  a  result,  the  typist  may  move  freely 
within  a  record,  moving  the  platen  by  hand  at  will.  There  are  certain  classes  of 
characters  whose  coordinates  must  not  only  be  decoded,  but  to  which  a  correction  must 
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be  applied  before  the  character  may  be  placed  into  a  2  dimensional  matrix. 

These  are  characters  which  print  above  or  below  the  line  and  whose  coordinates 
are  given  as  though  they  were  typed  directly  on  the  line  (see  Section  4.1.1). 

Since  each  Mergenthaler  character  includes  a  parity  bit  and  the  machine 
has  fairly  comprehensive  error  detection  hardware  which  causes  the  keyboard 
to  lock  or.  the  detection  of  a  parity  error,  it  results  in  fewer  mispunches  reach¬ 
ing  the  computer.  When  parity  errors  ure  detected  by  the  computer  it  is  almost 
certain  to  be  either  because  the  typist  made  a  correction  incorrectly,  or  there 
was  an  error  due  to  the  paper  t  pe  reader  when  the  paper  tape  was  transferred  to 
magnetic  tape. 

In  addition  to  the  above  considerations,  certain  syntax  requirements  have 
been  placed  on  each  of  the  various  fields  of  typed  information.  These  are  de¬ 
scribed  fully  under  the  program  descriptions  associated  with  manipulating  this 
information. 

Figure  11  describes  the  major  functions  performed  by  the  CHEMTYFE  System. 

The  relation  of  each  program  to  the  total  CHEMTYPE  System  is  pictured  in 
Figure  12  and  is  described  in  the  following  paragraphs. 

(1)  Input  of  Chemical  Typewriter  Information 

(a)  INPUTD-reads  Dura  Mach  records  from  magnetic  tape  and 
translates  all  of  the  typed  information  for  a  chemical 
compound  into  Mergenthaler  Code  and  places  it  in  a  2-dimen¬ 
sional  matrix. 

(b)  TAPWRM-  does  the  same  for  Mergenthaler  records. 

(2)  Formatting 

(a)  ORGNZR-  processes  a  single  chemical  record  formatting  the 
following  fields: 

(1)  Temporary  Identification  (TID)  (Local  Control  Number) 

(2)  Secui'ity  classification 

(3)  Stereo  information 

(4)  Structural  formula  image  (SFI) ,  If  bracketed  informa¬ 
tion  is  present,  ORGNZR  calls  on  REGRUP  to  reorder  the 
SFI  so  that  all  characters  within  a  single  set  of  brack¬ 
ets  will  appear  compactly.  If  any  atoms  appear  outside 
of  the  brackets,  REGRUP  in  turn  calls  on  XCESS  to  format 
■  his . 

(5)  Molecular  formula  (by  calling  MOLFRM) 

(6)  Nomenclature  and  reference  fields  (by  calling  MONIKP. 
which  in  turn  calls  PUNCH  to  punch  cards  for  TOXINFO 
f i’e) . 
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Figure  12.  Interrelation  of  Programs  in  the  CHEMXYPE  System 


(3)  Connection  Table  Generation  and  Verification 


(a)  SETUP  -  uses  SFI  to  find  a  capital  letter  in  the  matrix  and 
calls  on  CLEANM  to  remove  all  characters  but  atoms  and  bonds 

►  from  the  matrix.  It  creates  the  Abnormality  Table  and  ex¬ 

pands  any  instances  of  Ph,  (CH)n,  or  (C)n. 

(b)  MAKECT  creates  the  Connection  Table. 

(c)  PHASES  calls  on  VERIFY  to  verify  the  validity  of  the  Chemical 
Structure,  the  Abnormality  Table  and  the  Molecular  Formula. 

(4)  Output 

(a)  NFCF  expands  the  Connection  Table  to  a  form  acceptable  as 
input  to  CONVRT.  NFCF  calls  on  DECKA  to  print  the  Connec¬ 
tion  Table  when  a  switch  is  set.  NFCF  calls  on  CONVRT  to 
transform  the  Connection  Table  to  compacted  format  and  on 
TICKER  to  create  an  output  tape  containing  all  the  formatted 
information. 

(b)  PIX,  DURPIX  and  LINPIX  may  be  called  on  to  reconstruct  the 
structural  formula  image  for  output  on  punched  paper  vtape 
from  the  teletype  which  can  then  be  printed  on  the  Dura  Mach, 
for  output  directly  on  a  Chemical  Line  Printer,  or  for  simu¬ 
lated  output  on  a  1403  line  printer.  These  programs  are  de¬ 
scribed  in  Section  3.3,  since  they  are  used  to  output  respon¬ 
ses  to  queries. 

Error  messages  are  printed  as  each  program  encounters  error  conditions  in 
the  typed  recorjJ.  These  are  discussed  in  greater  depth  in  Section  5  of  the 
CHEMCYFE  System  .  A  listing  of  the  possible  errors  appears  in  Appendix  E. 

The  typewriter  translation  programs  require  the  entire  available  core  of 
about  23,000  locations  und  must  use  blocks  of  storage  for  more  than  a  single  pur¬ 
pose.  The  entire  CHSLlCYIE  system  of  programs  is  in  core  at  all  times. 

The  system  input  is  a  magnetic  tape  containing  Mergenthaler  or  Dura  Mach 
code  formatted  as  follows  In  fixed  300  word  records.  Each  paper  tape  consti¬ 
tutes  a  file  which  begins  with  a  six  character  tape  number  whose  first  char¬ 
acter  denotes  the  source  of  the  typing:  E«Edgewood  Arsenal,  F-Frankford  Ar~ 

KJnRlt  UHJniversity  of  Pennsylvania.  The  last  paper  tape  on  a  magnetic  tape  is  fol 
lowed  by  a  300  word  record  of  all  7's. 

Mergentha’ei  or  Dura  Mach  characters  on  mrgnetic  tape  are  packed  as  fol¬ 
lows  during  transcription  of  the  paper  tape  onto  magnetic  tape. 


0+— *3  4  -  — «►!] 

12-15  16-« - *-  23  24*-*27  23  ^ - **35 

L. 

0 

0 

Typewriter  Character  — 

Data  card  input  consists  of  a  card  giving  the  number  of  records  on  the  input 
tape  to  be  skipped,  and  cards  giving  the  TlD's  of  typed  records  to  be  deleted 
during  processing.  These  must  be  followed  by  one  blank  card.  The  format  of 
the  TID's  to  be  deleted  is  as  follows: 


(Left  Justified; 


The  output  tape  which  is  created  by  this  system,  consists  of  vaviabii-' 
length  records,  each  record  consisting  of  a  single  chemical  record.  The 
format  is  presented  in  Figure  13. 


WORD  0 

WORD  1 

WORD  2 

WORD  3 

WORD  4 
WORD  5 
WORD  6 


SFI 

Header 


S--2  3 


-*-  17  18**20  21- 


29  30' 


■35 


No.  of  words  in  block 
not  including 
Word  0 


2's  complement  of 
first  location  of 
MOLTAB 


2's  complement  of  first 
location  of  abnormality 
table 

(0  if  not  present) 


2's  complement  of 
first  location  of 
SCRUB  list 


010 


(  BCD  ) 


2's  complement  of  1st 
word  of  connection 
table 


2's  complement  of  first 
location  of  nomenclature 


Number  of  RINGS 


REGISTRY  NUMBER  (first  b  characters) 


REGISTRY  NUMBER  (second  6  characters) 


CLSTER  (Classification  and  Stereo) 


MOLTAB  BLOCK 


CONNECTION  TABLE  BLOCK 


ABNORMALITY  TABLE  BLOCK  (if  present) 


NOMENCLATURE  &  REFERENCE  BLOCK 


S**2  3-»~»5 


■17- 


■25  26- 


■  34  35 


DELY 

DELX 

Charges 

outside 

of  brackets  (total) 


SFI  BLOCK 


Locations 

to 

WORD  0 


Set  if 
underline 
table  exists 


Figure  13.  CHEKTYPE  Output  Format  For  One  Chemical  Record 


Code  Name:  TAPWRM 


■IP 


2.2.1 


ut  Program 


Programmer :  Helen  Hill 

Abstract: :  TAPWRM  reads  typewriter  characters  in  Mergenthaler  Code  from 

a  magnetic  tape.  It  interprets  these  codes  and  constructs  a  2-dimensional  ar¬ 
ray  containing  an  image  of  the  typed  chemical  record. 

2.2. 1.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Fig.  15. 

TAPWRM  reads  input  cards  giving  the  number  of  magnetic  tape  records  to  be 
skipped  before  processing  begins  and  the  TID's  of  compounds  to  be  deleted  by 
the  program.  It  then  begins  reading  the  Mergenthaler  character  stream  from 
magnetic  tape. 

Parity  is  checked  on  every  typewriter  character  and  the  program  looks  for 
one  of  the  following  at  the  beginning  of  a  paper  tape:  (a)  a  case  character, 
(b)  a  lozenge,  (c)  au  E,  F,  or  U  indicating  paper  tape  number  follows.  Each 
case  character  encountered  is  identified  and  the  current  case  is  stored.  When 
an  E,  F,  or  U  is  found,  the  tape  number  which  follows  is  formatted  and  printed 
on  the  line  printer.  (The  tape  number  is  not  necessarily  present).  The  pre¬ 
sence  of  a  lozenge  (O  )  indicates  the  beginning  of  a  chemical  record. 

The  remainder  of  the  chemical  record  is  then  read, one  character  at  a  time, 
and  the  characters  are  stored  in  the  proper  locations  in  the  matrix  until  a 
box,  or  **  are  encountered.  The  presence  of  a  box  indicates  that  the  typist 
will  begin  the  record  anew  and  the  record  to  this  point  should  be  ignored. 

The  **  indicates  that  the  chemical  record  has  been  completed.  When  the  **  is 
reached,  the  program  continues  reading  in  characters  until  the  next  lozenge 
is  found,  since  the  typist  may  correct  the  previous  record  at  any  time  before 
typing  the  next  lozenge.  The  program  makes  corrections  on  the  presence  of  e 
white  ribbon  punch  which  indicates  that  any  characters  found  before  the  occur¬ 
rence  of  a  black  ribbon  punch  are  to  be  erased.  Any  characters  which  are 
found  to  be  typed  into  a  location  that  already  contains  a  character  and  which 
are  not  legitimate  overtypes  (special  characters  not  included  in  the  character 
set)  are  considered  to  be  corrections  and  replace  the  character  already  in 
the  matrix.  Legitimate  overtypes  are: 


is 

typed 

as 

> 

or 

* 

\\\ 

is 

typed 

as 

or 

<* 

1  1  ! 

is 

typed 

as 

t 

or 

t 

li 
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These  are  replaced  In  the  matrix  by  the  correct  triple  bond.  When  bond  cross¬ 
ings,  such  as  -|—  or  are  encountered,  one  bond  must  be  erased  resulting 

in  |  or  N  /  in  the  matrix.  In  the  case  of  underlines,  Milch  print  in 

T 

same  matrix  location  as  character  underlined),  the  character  is  made  minus  and 
the  underline  dropped.  If  any  bond  is  typed  into  a  location  containing  a  brack¬ 
et  corner,  it  does  not  replace  the  bracket  corner.  Since  typists  have  been 
found  typing  bonds  in  the  same  location  as  the  bracket  corners,  the  program  was 
altered  to  prevent  this  fact  from  erasing  the  comer  itself. 

The  actual  input  into  the  matrix  is  accomplished  by  reading  characters 
and  storing  them  in  a  work  area  until  coordinate  punches  are  encountered  which 
give  the  coordinates  for  the  first  character  which  now  occupies  the  work  area. 

Each  set  of  coordinates  appears  as  a  series  of  six  paper  tape  characters.  The 
first  of  these  signals  that  coordinates  are  coming,  the  next  two  decode  into 
the  y  coordinate,  the  next  two  into  the  x  coordinate,  and  the  last  is  blank 
tape.  Using  the  decoded  coordinates,  the  characters  are  placed  in  the  appro¬ 
priate  location  in  the  matrix  after  proper  analysis  has  been  made.  Characters 
which  are  typed  while  the  carriage  is  in  one  location  but  which  actually  print 
below  the  line  one  or  two  spaces,  characters  for  which  the  carriage  does  not  ad¬ 
vance,  and  characters  which  are  double  symbols  typed  in  a  single  location  are 
classes  of  characters  which  are  recognized  and  the  proper  corrections  made. 

These  classes  are  as  follows: 

(1)  Characters  whose  Y  coordinate  is  given  for  the  line  on  which 
the  carriage  rests,  but  which  p  int  one  Y  coordinate  below 
this.  These  are: 

sub  case  /  \  ^  \ 

upper  case  \  / 

(2)  Characters  which  print  two  Y  coordinates  below  the  Y  coordinates 
registered  for  carriage  location: 

sub  case  = 

sub  case  ■*— 

carbon  dot  • 

(3)  "Non-escaping"  characters  for  which  the  carriage  does  not  move 
after  tiding,  but  instead,  stays  in  the  same  position  for  the 
typing  of  the  next  character.  There  are  three  special  groups 
of  these  characters  which  in  addition  to  being  "non-escaping" 
have  one  of  the  following  characteristics: 

(a)  The  character  types  one  y  coordinate  below  the  coordin¬ 
ate  registered  for  the  position  of  the  carriage. 

sub  case  II  >  I 

upper  case  I  >  11 

(b)  The  character  types  one  y  coordinate  higher  than  the 
carriage  position. 

lower  case  •• 
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The  virgule  is  one  non-escaping  character  which  because  of  its 
special  use  must  be  mentioned  separately.  The  non-escaping 
characteristic  requires  that  single  digit  fractions  be  typed  in 
the  following  manner:  virgule,  numerator,  denominator:  and  in 
the  case  of  a  two-digit  numerator  as  follows:  first  digit  of 
numerator,  virgule,  second  digit  of  numerator,  denominator. 

(4)  Special  double  characters  for  which  the  registered  y  coordinate 
is  correct  for  the  upper  half  of  the  character  but  for  which  the 
lower  half  prints  one  y  coordinate  before  the  registered  coor¬ 
dinate: 

sub  case  T  It  (used  in  typing  benzene  ring,  etc.) 

The  only  analysis  of  fields  in  the  matrix  that  TAPWKM  does  is  to  locate 
the  word  STEREO  which  tells  the  program  that  the  end  of  the  structure  has  been 
found  and  to  locate  the  **  which  indicate  the  end  of  the  chemical  record. 

Since  the  word  STEREO  may  be  forgotten  and  typed  in  later,  if  the  program  does 
not  find  STEREO  as  it  goes  along,  a  rescan  for  the  word  STEREO  is  done  when  the 
**  are  found. 

2.2. 1.2  Program  Structure 

TAFWRM  utilizes  the  following  subroutines: 

(1)  INPUT-  calls  for  the  next  300  word  logical  record  to  be  read  from 
magnetic  tape  into  a  buffer  and  looks  for  the  terminating  record 
of  all  7 ' o .  When  this  is  found,  it  sets  a  bit  and  returns  to  the 
program  to  finish  processing  the  last  record  before  exiting. 

(2)  NEXT-  unpacks  one  7040  word,  placing  the  next  eight  bit  char¬ 
acter  into  the  accumulator  after  stripping  off  the  parity  bit. 

When  the  contents  of  a  300  word  buffer  have  been  exhausted,  it 
calls  INPUT  to  get  the  next  record.  NEXT  checks  for  parity  errors 
on  each  input  character.  If  a  parity  error  is  found  on  input,  a 
bit  is  set,  a  parity  error  counter  is  incremented,  and  the  record 
is  deleted.  A  count  of  the  total  parity  error  encountered  be¬ 
tween  two  correct  compounds  is  printed  out  when  compounds  are 
entered  into  file. 

(3)  CALCXY-  calls  NEXCOR  to  get  each  next  coordinate  word.  It  un¬ 
packs  the  coordinates  (next  5  input  words)  which  come  in  as  eight 
bit  Mergenthaler  characters  and  calculates  the  x  and  y  coordinates. 
If  the  last  character  in  coordinates  is  not  a  zero  (or  blank 
punch)  CALCXY  rejects  the  record.  If  a  Code  delete  is  found  in 
the  coordinates,  CALCXY  assumes  that  a  parity  error  was  detected 
by  the  typewriter  and  that  the  typist  retyped  the  last  typed 
block.  This  block  is  erased  and  the  program  returns  to  processing. 
A  par it  error  in  coordinate  input  results  in  an  error  message  and 
rejection  of  the  record. 

(4)  NEXCOR-  calls  NEXT  to  get  the  next  input  word  of  coordinate  Input, 
checks  for  the  presence  of  either  a  code  delete  or  parity  error, 
and  strips  off  the  low  order  bit  of  the  coordinate  word. 


5.? 


(5)  YCOORD-  recalculates  input  y  coordinate,  modulo  132,  correcting 
on  the  basis  of  the  y  coordinate  of  the  lozenge  as  equal  to  1. 

(6)  CASES-  determines  whether  an  input  character  is  a  case  character, 
and  if  so,  stores  the  current  case. 

(7)  STORE-  stores  the  character  found  in  the  accumulator  in  the  next 
space  in  the  work  area. 

(8)  TERMIN-  handles  a  normal  end  of  tape  situation. 

(9)  EOFEXI-  is  entered  when  end  of  paper  tape  image  is  encountered 
(indicated  by  presence  of  file  mark).  It  writes  the  following  on 
the  line  printer: 

(a)  total  number  of  recorda  processed  (count  of  xozenges  en¬ 
countered)  . 

(b)  total  number  of  records  deleted  by  typist 

(c)  total  number  of  records  deleted  by  program, 

(d)  total  number  of  records  entered  into  the  file. 

(10)  ERASE1-  erases  a  record  which  is  reiected  before  the  system  exits 
from  TAPWRM.  All  necessary  locations  are  reinitialized, 

(11)  ERASE-  erases  records  which  are  rejected  after  the  system  exits 
from  TAPWRM.  All  necessary  locations  are  reinitialized. 


TAPWRM  takes  as  input  Mergenthaler  characters  read  from  magnetic  tape  and 
packed  as  follows: 


0  ^>3  W- - a- 11  12~15  16^--„  23  24^>27  28  ^ - a*  33 


0 

0 

1 _ 

/ 

— - - - 

_ 

kMergenthaler  Character 


»  s 

i 


Physical  records  are  300  7040  words  long  (900  characters).  If  a  paper 
tape  ends  before  the  end  of  a  physical  record,  the  record  ia  filled  out  with 
zeroes.  Each  paper  tape  record  is  followed  on  magnetic  tape  by  a  file  mark. 
The  final  paper  tope  on  a  magnetic  tape  appears  as  follows: 


(1)  paper  tape  characters 

(2)  physical  record  filled  with  zeroes  after  end  of  osper  tape 

(3)  file  mark 
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(4)  a  physical  record  containing  all  7's 

(5)  a  second  file  mark 


All  Mergenthaler  characters  appearing  on  the  paper  tape  appear  alao  on  the 
magnetic  tape.  Each  paper  tape  will  have  a  tape  number  of  up  to  six  characters 
which  will  be  found  at  the  beginning  of  the  tape  image.  The  first  character  is  : 

(1)  E  for  Edgewood  tapes 

(2)  F  for  Frankford  tapes 

(3)  U  for  University  tapes 

Each  Mergenthaler  character  is  eight  bits  and  may  he  a  typed  character,  a 
case  character  (upper,  iower,  or  sub-case),  a  space,  a  control  character  white 
ribbon,  black  ribbon,  code  delete,  blank  tape,  or  part  of  six  character 
coordinate  punch  as  illustrated  in  Figure  14. 


Figure  14.  MERGENTHALER  COORDINATE  PUNCH  CODE  ( BINARY ) 


TAPWRM  also  requires  input  cards  which  specify  the  following: 

(1)  record  number  on  magnetic  tape  on  which  processing  la  to  begin. 

(21  registry  numbers  of  compounds  to  be  deleted  on  procesf  ir.r  an.) 
paper  tape  number  on  which  thev  were  typed. 


(31  a  blank  card. 


Outputs  from  TAPWRM  are  as  follows: 

(1)  MATRIX--a  10000  location  matrix  containing  on  complete  chemical 
record  in  Mergenthaler  code  with  an  added  case  bit.  Each  loca¬ 
tion  containing  a  character  is  formatted  as  follows: 

QM -  - -  - *“26  2.7  28  29 35 

Ca-e  Input  char- 
bits  otter  wi th¬ 
ou  parity 


1  for  underlined 
character 


sun  -  oi 

LOWER  =  10 
UPPER  *»  11 


(2)  TAPE  Nstmber  - -printed  on  line  printer  at  beginning  of  each 
paper  tape  which  was  numbered. 

(3)  REGTAB  --  a  table  of  two  word  TID's  in  BCD  of  compounds  to  be 
deleted  bv  ORCNER. 

(4)  TAPTAB--ta'r  le  of  paper  tape  numbers  associated  with  the  registr’- 
numbers  in  RECTAB.  Five  registry  numbers  are  associated  with 
each  paper  tape  number  as  follows: 


TAPTAB 
l  BCD) 


REGTAB 

(BCD) 


REGISTRY  NUMBER  1 


REGISTRY  NUMBER 


i.iere  :.iav  he  more  than  ono  «  nlrv  f  -r  ..  singL-  paper  'ape. 


(5)  TAPXR- -pointer  to  last  entry  In  TAPTAB. 

(6)  WRIREG--indicator  made  minus  If  registry  number  is  to  be  printed 
out  when  It  has  been  formatted  by  ORGNZR  (usually  associated 
with  error  message), 

(7)  DELX--hor izontal  size  of  the  matrix. 

(8)  STELOC- -two 's  complement  of  location  with  matrix  of  the  S  In  STEREO. 

(9)  MTXLOC--locat ion  used  to  hold  current  matrix  location  (two's  com¬ 
plement  pointer)  during  processing. 

(10)  WIF0UT--j;ade  minus  if  record  is  to  be  deleted  after  registry  num¬ 
ber  has  been  formatted  and  printed,  (due  to  error  in  input). 
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Figure  15.  Chart  -  TAIWRM  (continued) 
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2.2.2  Dura  Mach  In qut_  Prograg 
Code  Name:  I NTITD 

Programmer :  Brtice  Hack 

Abstract:  This  program  accepts  magnetic  tape  Images  of  the  paper 

tap*,  chemical  records  typed  by  the  Dura  Mach  chemical  typewriter  and  reconstructs 
the  chemical  record  in  a  2-dimensional  array  called  MATRIX. 

2.2.2. 1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  5. 

Since  it  is  sometimes  desired  to  skip  a  certain  number  of  physical  records 
on  the  magnetic  tape  before  beginning  to  process  the  input  data,  the  program 
first  reads  from  a  card  the  number  of  physical  records  to  skip.  The  size  of  the 
record  is  specified  on  the  $FILE  card. 

Certain  chemical  records  may  have  been  seen  on  the  hard  copy  to  be  in  error 
before  computer  processing  began.  Thes'.  records  may  be  deleted  by  inserting 
cards  following  the  card  containing  the  number  of  physical  records  to  be  skipped. 

The  tape  number  is  formatted  and  printed  at  the  beginning  of  each  paper 
tape.  A  data  block  is  printed  containing  the  number  of  records  entered  in  the 
file  and  the  number  rejected  at  the  end  of  each  paper  tape. 

The  program  finds  the  beginning  of  the  chemical  record  to  be  processed  which 
is  indicated  by  a  special  Dura  Mach  code,  a  leading  wedge  (>  ).  Each  typed 
character  is  then  entered  into  a  2-dimons iona 1  array  called  MATRIX.  A  cartesian 
coordinate  system  is  used,  so  that  eaci  typed  character  occupies  a  particular 
(x,y)  coordinate.  Two  coordinate  registers  are  maintained  which  are  incremented 
or  decremented  as  a  result  of  control  character  punches  such  as  index,  space, 
reverse  index,  etc.  The  occurence  of  a  typed  character  increments  the  x-register 
by  1.  Characters  which  are  found  to  belong  in  an  already  occupied  matrix  location 
are  allowed  to  replace  the  character  which  is  already  there  with  the  exception 
of  the  legitimate  overtypes  described  in  Section  2 . 2 . 1 . 1.  The  end  of  a  record  is 
signalled  by  a  double  asterisk  (**)  in  the  input  stream.  The  MATRIX  is  then 
converted  to  an  internal  code  (Mergenthaler  code  with  added  case  bit)  so  that 
the  input  from  several  devices  looks  the  same  to  the  programs  which  follow. 

Field  pointers  are  then  set  up  and  the  program  exits. 

2. 2. 2. 2  Program  Structure 

Input  to  INPUTD  consists  of  cards  as  described  in  Section  2.2. 1.2  and  magnetic 
tape  containing  B  bit  Dura  Mach  characters  formatted  as  described  in  Section  2.2. 1.2. 

Output  from  INPUTD  is  the  same  as  that  described  in  Section  2.2. 1.2. 


Figure  16.  Macro  Flow  Chart 


I NPUTD 


•  3  Field  Recognizer  Anti  Foren  '~rt  pr,i~ 


Code  Name :  'RGNZR 

Programmer :  Helen  Hill 

Aba  tract :  ORGNZR  takes  the  reconstructed  matrix  (in  Mergenthaler  code 

with  a  case  bit  added)  and  recognizes  each  field  in  the  chemical  record.  It 
formate  the  temporary  identification  number,  the  security  classification,  the 
molecular  foimula(both  Hill  and  addend  mol  form  if  the  latter  is  present),  the 
structural  formula  image,  the  stereo  information  and  the  nomenclature. 

2. 2. 3.1  Program  DescriDtion 

A  macro  flow  chart  describing  this  program  is  presented  in  Pig.  1,.- 

The  progran  takes  as  input  the  reconstructed  matrix  containing  one  chemical 
record  stored  in  Mergenthaler  code  with  a  case  bit  added,  and  proceeds  as  follows: 

(1)  The  TID  (the  first  number  or  letter  in  the  matrix)  is  found  and 
formatted.  The  TID  must  meet  the  following  requirements: 

(a)  The  number  may  not  exceed  12  digits. 

(b)  The  number  may  contain  only  upper  or  lower  case  numbers,  upper 
or  lower  case  letters,  commas,  periods,  and  dashes. 

(c)  The  end  of  the  TID  must  be  signalled  by  a  space  and  no  spaces 
are  allowed  within  the  TID.  The  TID  Is  stored  in  BCu  in  two 
words  starting  with  REGNO.  If  WIPOUT  has  been  set  by  the  in¬ 
put  program  as  a  result  of  an  error  condition,  the  TID  is 
printed  and  control  is  returned  to  the  input  program  to  re¬ 
initialize  and  get  the  next  record. 

(/.'•  ORGNZR  then  expects  the  security  classification  to  follow  the 
TID.  The  presence  of  the  security  classification  is  signalled 
bv  a  left  parenthesis  and  the  end  of  the  security  classification 
bv  a  right  parenthesis  or  by  blanks.  The  program  accepts  one  of 
the  following  in  the  classification  field:  (C),  (U),  (c),  (u), 

()  and  allows  for  the  addition  of  other  codes  in  the  future.  If 
the  first  character  following  the  TID  is  not  a  left  parenthesis, 
it  is  assumed  that  the  classification  is  missing;  a  special  code 
for  a  missing  classification  is  assigned  (Sec.  2. 2. 3. 2),  and  the 
program  continues  to  (3). 

2',  The  program  expects  the  Hill  molecular  formula  to  follow  the 
security  classification  and  to  begin  with  either  an  upper  case 
letter  or  a  number.  ORCNZR  then  calls  on  MOLFRM  to  format  the 
molecular  formula.  If  the  first  character  after  the  c lassi ficatior 
is  not  a  number  or  capital  letter  or  a  left  parenthesis,  the  recorc 
is  rejected. 


(.)  The  program  X  l  ■■■■<:. \x  .•»  rh.-  !;rst  ciiaraii-r  ji>;  lowing  the  Hi  1 1 

no  lec.i  lar  formula.  i  I  t’ni«.  is  ar.  upper  lei  t  bracket,  It  checks 
to  see  whether  a  single  bom)  is  present  immediately  below  it. 

If  the  bond  is  present,  the  bracket  signals  the  beginning  of  the 
structure  of  a  bracketed  compound  and  the  program  proceeds  to  (5). 
If  no  bond  is  present  beneath  an  upper  left  bracket,  or  if  the 
first  character  following  the  molecular  formula  Is  a  colon,  the 
addend  molecular  formula  is  present  and  ORGNZR  calls  MOLFRM  to 
format  this  also. 

(5)  If  the  first  character  following  the  molform  is  not  a  colon 

or  an  upper  left  bracket,  it  is  assumed  that  this  Is  the  beginning 
of  the  structural  formula.  In  this  case,  a  list  Is  constructed 
containing  each  structural  character  and  its  relative  matrix 
location.  This  is  the  structural  formula  image  (SFI)  which  is 
stored  for  future  reconstruction  ot  the  structure  for  output. 

If  brackets  are  found  to  have  been  typed  around  any  portion  of 
the  structure,  they  are  -rased  except  for  the  upper  left  and  lower 
right  corners  whose  coordinates  will  be  used  to  reconstruct  the 
brockets  for  structural  output  in  PIX  and  DURPIX.  The  x  coordin¬ 
ate  of  all  lower  right  corner  brackets  is  stored  in  table  BXBRAK 
to  be  used  by  REGRUP  in  reordering  the  SCRUB  list.  Any  multiplier 
found  to  the  right  of  this  bracket  corner  is  stored  in  table 
MULTAB  to  be  used  by  VERIFY.  If  a  dot  is  found  not  connected  to 
a  bond,  it  is  assumed  that  this  indicates  a  monovalent  salt  and 
its  x  coordinate  is  stored  in  the  bracket  table.  The  last  entry 
in  this  table  is  always  the  maximum  x  coordinate  in  the  matrix, 

(A)  It  is  "9 some d  that  the  structure  must  end  before  the  Stereo  field 
is  reached  (the  location  of  this  is  provided  by  TAPWRM  «r  INPUT). 
When  the  STEREO  field  is  reached,  the  information  stored  here  is 
formatted. 

(7)  The  nomenclature  is  assumed  to  3tart  one  or  more  lines  under  the 
S  in  STEREO  or  one  or  two  spaces  to  the  right  of  this.  When  the 
nomenclature  is  located,  ORGNZR  calls  MONIKR  to  format  this  field 
whose  end  is  signalled  by  the  double  asterisk  at  the  end  of  the 
chemical  record. 

(8)  During  the  formatting  by  ORGNZR,  the  following  error  conditions 
are  considered  cause  for  rejection  of  the  record: 

(a)  The  T1D  contai.:r  an  inadiui  tsable  character  or  more  than  12 
charac  ters . 

(b)  Tne  classification  contai.is  an  inadmissable  character. 

(c)  The  molecular  formula  begins  with  a  lower  else  letter. 

(d)  Molecular  formula  syntay  errors  were  found  in  MOLFRM. 

(e)  An  unidentified  charade'  was  found  in  matrix. 
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(f)  The  structural  formula  extends  beyond  the  STEREO  field, 
tg)  The  structure  extends  into  the  molecular  formula  line. 

(h)  Too  many  characters  were  found  for  the  SCRUB  list. 

(I)  The  addend  molform  is  misoing.  At  present,  it  is  assumed 
that  if  a  bracketed  structure  is  present  and  there  is  no 
charge  in  the  structure,  an  addend  molform  must  be  present. 

(J)  Stereo  information  does  not  match  with  admissible  codes. 


(9)  If  brackets  were  found  during  the  formation  of  the  SPI  or  if  a 
dot  was  found  not  connected  to  any  other  character (indicating  a 
monovalent  salt),  REGRUP  is  called  to  reorder  the  SFI  so  that  all 
characters  within  a  set  of  brackets  appear  compactly  in  the  list. 
If,  in  addition,  it  is  found  that  there  are  atoms  outside  the 
brackets,  REGRUP  in  turn  calls  EXCESS  to  format  this  Information 
which  will  later  be  used  by  the  chemical  verification  program. 


2. 2. 3. 2  Program  Structure 

ORGNZR  requires  415G  core  locations  and  includes  th«,  following  Tables  which 
are  also  used  by  other  programs.  % 

.) 

(1)  AXACT2  -  Mergenthaler  input  codes  without  a  parity  bit  but  with 
a  case  bit  added. 


(2)  INT  -  Internal  Code  (modified  Dura  Mach)  counterparts  of  table 
AXACT2. 


(3)  AXBClJI  -  ^CD  numbers 

(4)  AXBCD  -  BCD  letters 


It  contains  the  definitions  of  the  following  large  blocks: 


SCRUB  list 
B  list 
Ex  list 
Y 
G 


700  locations 


used  bv  CONVRT  and  shared  by  REGRUP 


It  makes  use  of  a  subroutine, DELETE,  which  can  delete  from  processing  an> 
compounds  whose  TID'9  are  entered  on  data  cards  and  stored  in  REGTAB  and  TAPTAB 
(supplied  bv  the  input  program*). 


Input  to  ORGNZR  co.u;i.jtc  >t: 

(1)  DELX  -  width  of  the  natrly. 

(2)  MATRIX  -  the  reconstructed  chemical  record. 

(3)  STELOC  -  the  location  of  the  STEREO  Held  In  the  matrix. 

('»)  REGTAB  and  TAPTAB  -  tables  giving  paper  tape  number  and  rtD's  of 
compounds  to  be  deleted. 

Output  from  ORGNZR  Is  as  follows: 

(1)  SCRUB  -  list  of  all  structural  characters  and  their  relative 
matrix  location  formatted  as  follows: 


r 


(4)  DELY  •  actuo  •  u'or<Jin.iLe  size  of  the  na  r  v  required  by  the 
a  true  ture . 

(5)  ASCRUB  -  pointer  to  tue  end  cf  scrub  f  i  s 

(6)  BXBRAK  -  table  of  x  coordinates  of  right  hand  brackets  in  structure 


18  - - - - ^  35 


x  coordinate 

last  x  coord  is  ~  DELX 


v  ■)  GRP2  -  pointer  to  last  entry  in  SCRUB  that  lies  within  brackets 
if  they  are  present,  and  last  entry  in  SCRUB  if  brackets  are  no! 
present . 

(8)  AXTMOD  -  absolute  address  of  first  location  of  the  SFI  in  the 
MATRIX  . 

(9)  MULTAB  -  table  of  multipliers  for  each  nortion  of  a  structure 
that  is  within  a  set  of  brackets. 

MULTAB 

18  21  22  -a - fc-35 

I - - r — - 

multiplier 


last  multiplier  is  *  to  1 

(10)  UNDTAB  -  A  table  of  underlines  (10  locations)  which  contains 
the  SCRUB  list  entries  i  f  under  I i ned  characters  to  be  used  in 
reconstructing  the  matrix  for  print  out. 

(11)  MOLTAB  -  formatted  molecular  formula  as  described  ir.  Section 

2. 2. 4. 2. 

(13)  NOMTAB  -  formatted  nomenclature  and  reference  fields  as  described 
in  Sec tion 2 .2 . 5. 2 . 

(14)  MATRIX  -  the  Structural  formula  in  the  matrix  is  stored  in 
modified  Dura  Mach  code  as  follows  on  output  from  ORGNZR. 


20 


♦ 

Case  bit 


30^, 

Dura 

code 


35 


The  rest  of  the  record  in  the  matrix  remains  in 


Mergenthaler  Code. 


The  program  produces  diagnostic  P-dumps  when  sense  switch  5  is  set. 
sense  switch  6  is  set,  the  formatted  information  is  printed  at  the  end  of 
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When 

ORGNZR. 


Figure  17.  Macro  Flow  Chart  -  ORGNZR  (continued) 
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’2;2.4 


Molecular  Formula  Format  Ft-  >&rar 


Code  Name:  MOLFRM 


Programmer:  Helen  Hill 

Abstract:  This  program  formats  the  molecular  formulas. 


2. 2. 4.1  Program  Description 

A  macro  flow  chart  describing  t ; i i s  irogram  is  presented  in  figure  19. 

MOLFRM  is  provided  with  a  pointer  to  thn  firs!  molecular  formula  charac t 
in  the  matrix  by  ORGNZR.  It  then  produces  the  formatted  molecular  formula. 

The  following  are  definitionsf*of  syntactical  terms: 

<PREFIX>; :*<NUMBER>i<EMPTY> 

<SYMBOL>::=<CAPITAL  LETTER.'  ! •■•CAPITAL  LETTER'  '-SMALL  LETTER> 

<5EPAKAT0R>:  :=<D0T^|<BLAN1C  D(m |<  BLANK-*  DOT  <BLANK>|'DOT  •  BLANI 

<TERMI  MATOR> :  s  =<BLANK>  <BLANK> 

<SUBSCRIPT> : :  =<NUMBER> 

<COMPOUND  SYMBOL>i:«<SYMBOL>|<SYMB01,‘  -  SW.SCK  ITT- 

<FORMULA  PART>  COMPOUND  SYMBOL  •[ ••COMPOUND  SYMBOL  COMPOUND  SYMIK)L> 

The  program  requires  Llio  following  syntax  tor  the  Hill  Molecular  formula 
of  the  simple,  non-hydrated  compound: 

<111 LL  MOLFORM>::  ^FORMULA  PAKT>  <  TERMINATOR.  • 

The  hydrated  Hill  molecular  formula  must  have  ihe  following 
syntax  which  allows  a  multiplier  lor  either  or  both  the  Hill  parenL 
and  the  water. 

<HYDRATED  HILL  MOI.FORM>:  :*•  PREFI5C-  --FORMITA  PART>  <SEPARATOR>  vPREFIX> 

•  FORMULA  PART>  --TERMI NATOR> 

The  addend  molecular  formula  which  must  be  present  in  addition  to  the 
Hill  Molecular  formula  for  addends  is  either  preceded  by  a  colon  (:)  or  is 
surrounded  by  an  upper  left  and  upper  right  bracket.  The  addend  molecular 
formula  requires  the  use  of  the  following  expanded  definitions: 

<TERMINATOR;- :  :  =<BLANK>  <BLANK>|<UPPER  RIGHT  CORNER>  <BLANK>  <BLANK> 

<S1GNAIP-; :  -  .  |<UPPER  LEFT  CORNERS 

<SEPARATOR> { : «<DOF-  |<BLANf“ •  -DOT- | --BLANKS  '  IX)T>  •  BLANK>|<U1T' 

^BLANH''|--D01?  -• 1SIANK>|^UPPER  RIGHT  CORNERS*  ■  DOT  -| 

•'UPPER  RIGHT  CuKNER>  •  DOT>  •  UI.ANIO 


*  Backus-Nai  r  Form. 
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<ADD£ND  PAR'D*-;  :-<PREFIX>  <FORMULA  PARX>  <SEPARATOR> 

<FINAL  ADDEND  PART>:  :»<PREFIX>  <FORMULA  PART?*  <TERMINATOR> 

The  syntax  required  for  the  addend  molecular  formula  is  as  follows: 

<ADDEND  MOLFORM>:  :=■  <ADDEND  PART>  <FINAL  PAR1>|<ADDEND  PART> 

<ADDEND  PARr>  <FINAL  PART>  | <ADDEND  PART> 

<ADDEND  PART>  <ADDEND  PART>  <FINAL  PART> 

The  addend  molecular  formula  syntax  allows  for  handling  compounds  consist¬ 
ing  of  up  to  four  addend  parts.  In  the  case  of  addend  molecular  formulas, 
water  of  hydration  is  treated  as  an  addend  part.  Provision  has  been  made, 
however,  not  to  reject  a  compound  in  which  the  typist  has  placed  the  dot  and 
the  water  outside  the  upper  right  corner  bracket. 

This  program  assumes  that  'll  polymers  admitted  to  the  system  are  con¬ 
densation  polymers  and  will  apt  ,r  with  the  sum  of  their  atom  counts  in  the 
Hill  molecular  formula.  Condensation  polymers  of  Indefinite  size  are  assumed 
to  be  present  in  a  (...)  form  and  a  bit  is  set  by  ORGNZR  which  instructs 
MOLFRM  to  ignore  the  n.  n 

2. 2. 4. t  Pfegriw  Structure 

MOLFRM  is  a  subroutine  requiring  448  core  locations.  It  uses  tables  in 
ORGNZR  when  recognizing  Mergenthaler  characters  and  converting  them  to  BCD. 

Its  output,  MOLTAB  is  a  25  location  table. 

The  Input  to  MOLFRM  is: 

(a)  Pointer  to  first  molform  character  in  MATRIX 

(b)  MATRIX 

The  output  from  MOLFRM  is: 

MOLTAB  -  formatted  molecular  formula  is  shown  in  FigurelS. 


WORD  0 


numerator 


denominator 


1  if  multiplier 
is  a  fraction; 

0  if  an  integer. 


If  it  is  not  a  fraction,  multiplier  fills 
Set  if  indefinite  8  Mt8>  rl«ht  Justified, 

polymer 


Vr*: 

O 

v4 

<r 

11  -*—*-17 

18  * - *-26 

27* - *  34 

WORD  1 

Number  of 
oxygen 
atoms 

Number  of 
nitrogen 
atoms 

Number  of 
hydrogen 
atoms 

Number  of 
carbon 
atoms 

No.  words  in 
mol.  formula 
CHill) 


S 


0  * - o-ll  12 -**.17— 18* - *29  30  * — *35 


Element 

Number  of 

Element  kind 

Number  of 

WORD  2-M 

kind 

atoms  of 

(BCD) 

atoms  of 

(Hill) 

(BCD) 

element 

element 

q  * - »  8  9  * - *  17  16* - - *  26  27-* - - *35 


Multiplier 

Multiplier 

Multiplier 

Multiplier 

WORD  Mi-1 

of  first 

of  second 

of  third 

of  fourth 

addend 

addend 

addend 

addend 

FORMAT  OF  MULTIPLIER  SAME  AS  WORD  0  MULTIPLIER 


WORD  Jtt-2  SAME  AS  WORD  1  --  but  for  addend  molform 

WORD  M4-3  SAME  AS  WORD  2—» M  --  but  for  addend  molfom 


Figure  18,  Formatted  Molecular  Formula 
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In  the  preceding,  the  following  rules  hold: 

1.  If  the  element  type  is  a  single  letter  element,  the  first  char¬ 
acter  in  words  2-M  is  a  BCD  blank. 

2.  If  no  addend  molform  exists,  the  block  ends  with  word  N.  If 
there  is  more  than  one  addend  there  is  a  separate  block  for  each 
addend  containing  words  M4-2  to  M+n  for  that  addend. 

3.  If  the  only  addend  is  water,  no  addend  form  appears.  If  there  is 
an  addend  in  addition  to  water,  the  water  appears  in  the  Hill  form 
as  above  and  in  the  addend  form.  When  the  only  addend  is  water, 
the  addend  bit  is  not  set  in  the  Header  word. 


MOLFRM 


F-lgure  19.  Macro  Flow  Chart 


MDLFRM 


k 


2. 2. 5.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  20. 


MONIKR  takes  a  pointer  to  the  first  word  in  the  nomenclature  and  formats 
all  the  information  it  finds  until  a  double  asterisk  (**)  is  encountered.  The 
presence  of  a  double  bond  or  triple  bond  in  this  field  signals  the  start  of 
the  reference  field.  MONIKR  allows  for  the  presence  of  underlines,  superscripts, 
and  any  character  which  can  be  typed  on  a  chemical  typewriter  with  the  exception 
of  certain  bonds.  It  takes  the  characters  which  are  in  modified  Mergenthaler 
code  in  the  MATRIX  and  translates  them  to  the  following  for  storage. 


6  Bit 
Dura  Code 


9  Bit  character 


y  i  v 

Superscript  Dura  case  bit 


Each  line  of  nomenclature  is  taken  to  be  all  the  material  in  the  same 
coordinate  and  followed  by  at  least  three  blank#  in  the  matrix  and  is  delimited 
from  the  other  lines  in  the  output  table.  The  same  is  done  with  the  reference 
material. 


MONIKR  allows  up  to  400  characters  in  these  fields  and  rejects  the  compound 
if  there  are  more,  It  also  checks  to  make  certain  it  has  not  exceeded  the  matrix 
area  while  getting  characters  to  store. 

MONIKR  calls  PUNCH  if  it  is  desired  to  punch  .ct  certain  descriptor  infor" 
mation  on  cards. 


2.2. 5. ^  Program  Structure 

This  program  is  a  subroutine  which  requires  423  core  locations.  It  useo 
tables  in  ORGNZR  to  recognize  Mergenthaler  characters  and  translate  them  to 
modified  Dura  Mach. 

Input  to  MONIKR  consists  of: 

(1)  Pointer  to  first  character  in  field. 

(2)  MATRIX  -  described  in  Section  2. 2. 1.2. 
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Output  from  M0N1KR  consists  of  NOMTAB,  a  table  of  up  to  100  locations  con¬ 
taining  the  formatted  nomenclature  and  reference  field.  Each  nomenclature  entry 
(one  line  of  nomenclature)  is  delimited  by  a  7 7 7a  character  from  the  next.  The 
last  nomenclature  word  is  filled  out  with  seroa.  References  are  separated  from 
each  other  by  a  777e  character  and  the  beginning  of  the  reference  field  is  pre¬ 
ceded  by  a  triple  bond  chaaeetor  in  the  output  block.  The  last  word  in  the 
reference  field  is  ale*  iMftMMt'  with  ceros. 


3  -« -  -*-17 _ 21-^ -  -  ►  35 


2' s  comp,  of  number  I 

i'a  comp,  of  total  number 

nomenclature  words  f 

of  words  in  block 

Header  of 

nomenclature 

block 


0-* - ■*’8  9  - ^"17  18  - ►26  27  -* - -  35 


4  characters  per 
7040  word 


i 
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^  MONIKR  ^ 

Initialize 

| 

I 

Get 

a 

character 

Go  to  next 
line 


i 


Figure  20.  Macro  Flow  Chart  -  MONIKR 
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2.2.6 


Descriptor  Punch  Program 


Code  Name:  PUNCH 

Programmer;  R.  Chao 

Abstract;  This  program  finds  the  EA,  T,  and  TL  descriptors,  If 
there  are  any.  It  than  gets  the  correspoudlng  TID  numbei  of  the  compound  and 
punches  It  on  a  card  followed  by  the  EA,  T,  or  TL  descriptor  number. 

2. 2. 6.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  21. 

PUNCH  gets  the  matrix  and  the  first  location  of  tho  reference  field  as 
input  from  the  main  program  and  searches  for  EA,  T,  and  TL  descriptors.  If  one 
of  the  descriptors  listed  above  is  found,  a  card  la  punched  containing  the  TID 
compound  number  and  the  descriptor. 

2. 2. 6. 2  Program  Structure 

PUNCH  is  a  subroutine  called  from  M0N1KR  which  takes  as  input  the  MATRIX 
and  RFIELD  which  points  to  the  reference  field.  Its  output  is  cards  with  the 
following  format: 

Card  Column  1-12 

13 

14,  15 
16-20 


12  character  TID  #  (left  juatifled) 
Blank 

Code*  EA,  T,  TL 
Number  (right  Juatifled) 


figure  21 


Macro  Flow  Chare 


PUNCH 


2.2.7 


SFI  Reordering  Program 


Code  Name:  REGRUP 

Programmer:  Helen  Hill 

Abstract;  Program  reorders  the  SP1  when  brackets  or  a  monovalent 

salt  are  present  so  that  all  characters  within  a  given  set  of  *  coordinates  ap¬ 

pear  compactly  in  the  SFI. 

2. 2. 7.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  22. 

REGRUP  takes  the  list  of  x  coordinates  of  right  lower  corner  brackets  and 
reorders  them  to  make  certain  the  x  coordinates  are  in  an  ascending  order, 
checking  for  the  following  error  conditions: 

(1)  2  identical  x  coordinates 

(2)  An  empty  bracket  list 

These  errors  result  In  the  rejection  of  a  chemical  record.  It  then  takes 
each  entry  in  the  scrub  list,  computes  the  x  coordinate  from  the  relative  matrix 
location,  and  stores  this  entry  in  the  appropriate  list.  The  following  errors 
result  in  rejection  of  a  compound; 

(1)  Too  many  characters  for  a  given  list 

(2)  Empty  or  incorrect  bracket  list 

REGRUP  stores  a  pointer  to  the  last  entry  in  each  list  (which  corresponds 
to  one  set  of  brackets,  the  final  set  being  assumed  to  be  everything  to  the 
right  of  the  last  explicit  bracket)  in  MULTAB  which  will  be  used  by  VERIFY.  It 
then  replaces  the  reordered  lists  in  order  in  the  SFI  and  stores  a  pointer  to 
the  first  character  beyond  explicit  brackets  (if  any).  REGRUP  counts  up  all 
plus  and  minus  charges  outside  of  explicit  brackets  and  stores  the  totals  in 
the  Header  word  of  the  SFI.  If  characters  appear  outside  the  explicit  brackets, 
REGRUP  calls  EXCESS  to  format  them. 

2. 2.7.2  Program  structure 

REGRUP  is  a  subroutine  which  requires  303  core  locations.  It  uses  areas 
defined  in  ORGNZR  (13^0  locations)  to  reorder  the  various  lists  and  it  uses  the 
SCRUB  list  from  ORGNZR  (701  locations). 

It  is  made  up  of  5  Macros: 

(a)  ADDUP  and  ORDER  -  sort  the  brackets. 

(b)  LIST  -  puts  an  entry  in  proper  list  and  increments  right 
pointers. 
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(c)  REFORM  -  transmits  reordered  lists  to  SCRUB  list  after 
reordering. 

(d)  3T0MLT-  stores  necessary  information  in  MULTAB. 

REGRUP  takes  as  input  the  following: 

SCRUB  -  SFI  characters  and  their  Relative  Matrix  location. 

BXBRAK  -  list  of  x  coordinates  of  lower  right  corner  brackets. 
MULTAB  -  contains  multiplier  associated  with  each  set  of  brackets. 
ASCRUB  -  pointer  to  end  of  SCRUB. 

Output  from  REGRUP  is  as  follows: 

(a)  Reordered  SCRUB 

(b)  MULTAB  -  with  pointers  to  last  character  in  SCRUB  using  a 
given  multiplier 


2s  comp,  pointer 
to  last  entry  ;.n 
SCRUB  list  for  this 
Multiplier  (and  this 
set  of  brackets) 


/ 


^  RETURN  ^ 


.Figuro  22.  Macro  Flow  Chare  -  REGRUp 
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2.2,8  Structure  Of  Non-Bracketed  Information 


Code  Nome:  EXCESS 

Programmer :  Helen  Hill 

Abstract:  EXCESS  formats  all  structural  characters  appearing  out¬ 
side  of  brackets  when  brackets  appear  in  the  structure. 

2, 2. 8.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Fig.  23. 

EXCESS  formats  information  found  outside  brackets  in  structures  such 
as  H„SO,  since  this  portion  of  the  compound  does  not  appear  ?.n  the  con¬ 

nection  table  Ind  is  needed  for  chemical  verification.  EXCESS  uses  the  pointer 
to  the  first  word  outside  brackets  in  the  SCRUB  list  to  locate  that  characfe.: 
in  the  matrix  using  the  relative  matrix  location  in  the  SCRUB  list  for  that 
character.  It  then  sets  a  bit  (ION)  to  be  used  by  VERIFY  and  looks  for  the. 
first  addend  dot  in  the  matrix  outside  of  brackets.  It  formats  all  characters 
on  that  y  coordinate  from  that  x  coordinate  to  the  end  of  the  line.  EXCESS 
totals  plus  and  minus  charges  outside  brackets,  using  prefix  multipliers  if 
any,  EXCESS  requires  the  following  syntax  in  the  information  it  formats. 

<SYMBOL>: :  -<CAP>|<  CAP>  <SMALL  LETTER?* 

<SUBSCRIPT>:  :=>  <HUMBER> 

COMPOUND  SYMBOI>: :  -<SYMBOI>f<SYMBOJ>  <SUBSCRIPT> 

<FOEMULA  PART>: :**<COMP.  SYMBOL>|<FORMULA  PART>  <COMP.  SYMB0I> 

<EXCESS  FORMULA  PART>: :«<SIGNAL>  kPREFUO  <FORMULA  PART>  <COMP.  SEPARATOR> 

<PREFIX>: :-<NUMBER>  |  0 

<SIMPLE  SEPARATOR?*:  :»<BLANK>  j  0 

<COMP.  SEPARATOR>:  :=<SEPARATOR>|<SEPAMTOR>  <SEPARATOR?>  i  +  <SEPARATOR>  ! 

,  <SEPARATOR> 

<SIGNAt>: • | *<BLANK> 

Everything  that  is  found  having  the  same  y  coordinate  as  the  first  addend  dot 
outside  the  brackets  is  formatted  provided  it  satisfies  the  following  defini¬ 
tion: 

<FORMAT>: :»  EXCESS  FORMULA  PART>  |  <F0RMA1>  <EXCESS  FORMULA  PART> 

The  prefix  in  this  case  is  used  to  multiply  each  compound  symbol  until 
the  next  prefix  is  found, 

When  water  is  found  in  the  string  it  Is  Ignored.  EXCESS  allows  any 
combination  such  a a  (,H^S0^.2Cl',H  0)  as  long  as  the  elements  are  all  typed  on 
the  same  line. 


2. 2. 8. 2  Program  Structure 


EXCESS  is  a  subroutine  which  requires  335  core  locations  and  utilizes 
tables  AXACT2  and  INT  in  ORGNZR.  It  contains  a  subroutine,  SEARCH,  which 
takes  a  given  SCRUB  list  character  and  translates  it  to  Mergenthaler  code 
to  allow  range  tests  to  be  done  for  syntax  analysis. 

EXCESS  takes  as  input  the  following  output  from  ORGNZR  described  in 
section  2. 2. 3. 2. 


See  ORGNZR  Output,  Section  1,/,  i  ; 


Output  from  EXCESS  is  the  .'oilcvuig: 

XTAB  -  a  table  formatter.  -•  -cllows. 


element,  the  second 
character  is  a  BCD  bla-.k 

PL  -  total  plus  charges  outside  brackets 
MIN  -  total  minus  charges  outside  brackets 

ION  -  set  to  signal  that  there  is  something  in  XTAB  for  VERIFY  to 
use , 
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Figure  23.  Macro  Flow  Chart  -  EXCESS 
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2.2.9 


Error  Message  Program 
Code  Name :  APOLGY 

Programmer:  Helen  Hill 

Aba tract:  AFOLGY  writes  error  messages. 

2.2.  9.1  Proaram  DescriDtion 

_  tran3ferred  to  from  ORGNZR ,  EXCESS,  REGRUP,  MOLFRM  and  MONIKR 

to  write  error  messages  using  Fortran  read-write  routines. 


^  APOLGY  ^ 

Print 

appropr 

message 

iate 

INPUT 

or 

TAPWRM 


2. 2.9.2  Program  Structure 
APOLGY  is  a  Subroutine. 
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2.2.10  Linear  String  Classif icatlon 


Code  Name:  SETUP 

Programmer:  Bruce  Hack 

Aba  tract:  This  program  finds  a  capital  letter  in  the  SCRUB  list 

and  then  scans  to  the  left  and  right  of  this  letter  in  the  MATRIX  assigning  a 
type  code  to  the  linear  string.  It  then  transfers  to  CLEANM  for  processing. 

2.2.10.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  24 „ 

SETUP  scans  the  SCRUB  list  until  a  capital  letter  is  found  and  then  locates 
the  left  bound  of  the  linear  string  on  the  basis  of  the  following  definition: 

Definition:  A  linear  string  is  a  set  of  symbols  on  a  horizontal  line 
bounded  on  the  left  and  right  by  bonds  or  blanks  and  containing  at  least  one 
capital  letter. 

A  left  to  right  scan  is  made  of  the  linear  string  placing  each  symbol  in 
one  of  the  following  classifications: 

(a)  C 

(b)  P 

(c)  H 

(d)  Other  capital 

(e)  Small  letter 

(f)  ( 

(8)  ) 

(h)  number 

(1)  Illegal  symbol 

The  concatenation  of  these  classifications  is  the  code  for  the  string. 

e.g.  -(CH2)6  "  has  the  code:  fachgh 

An  illegal  symbol  immediately  causes  rejection  of  the  record.  Control  is 
given  to  CLEANM  upon  concatenation. 


2.2.10,2  Program  Structure 


SETUP  Is  a  main  program  which  receives  control  from  ORGNZR  end  gives  control 
to  CLEANM  when  through. 

Input  to  SETUP  are  the  SCRUB  list  and  the  MATRIX.  These  are  described  in 
Section  2. 2. 3. 2. 

Output  from  SETUP  is  LINSTG,  the  code  classification  of  a  node. 
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Figure  24.  Macro  Flow  Chart  -  SETUP 
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2.2.11  Reduction  Of  The  Mflf.Hv-ln  Pninr-n  AmLXlnM. 


Code  Name: 


CLEANM 


Programmer:  Bruce  Hack 

Abstract;  CLEANM  is  given  a  pointer  to  a  specific  node  by  SETUP. 
It  then  "cleans"  the  eight  locations  around  that  node  in  the  matrix  for  use 
in  MAKECT,  All  charge  signs  and  mass  numbers  are  removed,  double  letter 
elements  are  replaced  by  a  one  word  symbol,  and  special  cases  such  as  Ph  and 
-(C)n*  are  treated.  An  abnormality  table  of  abnormal  masses,  charges  and 
valences  is  created.  A  connection  table  number  is  assigned  to  each  atom 
and  the  word  In  SCRUB  corresponding  to  a  node  which  has  been  processed  by 
CLEANM  is  made  minus.  Control  is  returned  to  SETUP  after  operation  on  the 
given  node  is  complete. 

2.2.11.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  In  Figure  25. 

A  code  for  the  node  to  be  operated  on  Is  provided  by  SETUP,  classify¬ 
ing  the  node  in  one  of  the  following  classes: 


(a) 

Single  letter  atom 

(e.g., 

C) 

(b) 

Double  letter  atom 

(e.g.. 

Cl) 

(c) 

H 

<d) 

Ph  (representing  a 

phenyl 

or 

(e) 

-«v 

(f) 

< 6  > 


group  ) 


In  the  typed  structure,  it  is  assumed  that  any  symbol  in  any  of  the 
8  positions  surrounding  an  atom  belongs  to  that  atom. 

Example 

The  bond  shown  is  assumed  to  belong 
to  the  N.  This  positioning  of  a 
bond  is  incorrect  and  the  computer 
will  reject  the  record. 


/ 

N 

/_ 

N 

The  bond  shown  here  is  assumed  not 
to  belong  to  the  N. 
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NOTE: 


By  “belong  to"  we  mean  connected  to  in  the  case  of 
a  bond  or  associated  with  in  the  case  of  a  charge  sign 
mass  number,  etc. 


The  following  actions  are  taken  by  CLEANM: 

(a)  The  sign  of  the  SCRUB  word  pointing  to  this  atom  is  made 
minus  for  use  in  MAKECT  and  the  next  available  connection 
table  number  is  then  assigned  to  this  atom.  This  number 
is  placed  in  the  MATRIX  word  containing  the  atom.  A  search 
is  then  made  around  the  atom,  counting  the  connecting  bonds 
for  comparison  with  the  normal  valence  for  that  atom.  If 
the  valence  is  found  to  be  larger  than  the  normal  valence, 
an  entry  is  ir  de  in  the  abnormality  table.  The  upper  left 
is  checked  for  a  mass  number.  If  one  is  found,  an  entry 
is  made  in  the  abnormality  table.  The  number  is  then  removed 
from  the  matrix  and  replaced  by  a  bond  if  necessary. 


e.g. 


before 


\ 

I 


\ 


after 


Note:  The  "1"  need  not  be  erased 

3ince  it  will  not  interfere 
with  future  processing. 

Similarly  the  upper  right  is  checked  for  a  charge  and  an  entry  is 
made  in  the  abnormality  table  if  one  is  found. 


(b;  The  lower  case  letter  of  a  2  character  element  Is  placed  in 
bite  24-29  of  the  MATRIX  entry  for  the  upper  case  letter  and 
the  lower  case  letter  Is  replaced  In  the  matrix  by  a  bond 
connection  if  necessary.  The  following  example  describes 
this  process. 


c 

— * 

MATRIX  before  <b) 

r 

i 

MATRIX  after  (b) 


control  is  then  given  to  ca3S  (a). 

(c)  The  presence  of  a  hydrogen  atom  with  a  single  connection  is 
ignored.  A  hydrogen  atom  with  two  connections  is  treated  an 
a  single  letter  atom  and  control  is  given  tc  case  (a).  The 
presence  of.  a  hydrogen  in  the  connection  table  will  later 
causa  rejection  of  the  record  during  chemical  verification, 

(d)  The  phenyl  group  (Fh)  is  replaced  by  a  single  C  atom  in  the 
MATRIX  and  the  internal  connections  of  the  expanded  phenyl 
group  are  placed  directly  in  the  connection  table.  The  sign 
of  the  SCRUB  list  word  containing  the  ,:P"  is  made  minus  and 
the  first  and  last  connection  table  numbers  of  the  atoms  in¬ 
cluded  In  tbs  expansion  are  entered  into  the  C  word  in  the 
MATRIX, 


before 
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(e) 


after 


Carbon  chains 
The  SCRUB  Vst 


are  replaced  by  a  special  code  in  the  MATRIX, 
words  are  then  handled  as  in  case  (d). 


be  fort 


after 


CLEANM  rejects  chemical  records  for  the  following  reasons: 
(1)  More  than  19  abnormalities  present  in  compound. 
(?)  Inadmissible  string  found  is.  structural  formula 

(3)  Illegal  symbol  found  around  an  atom. 

(4)  Hydrogen  found  In  the  wrung  place. 


.2.11.2  'fro gram  Structure 

CLEANM  is  a  main  program  which  is  transferred  to  by  SETUP  and  which 
ransfers  back  to  SETUP  when  processing  is  complete. 

Input  to  CLEANM  consists  of  the 

(a)  SCRUB  -  list  which  contains  all  structural  formula  information. 

/b)  MATRIX  -  the  coded  two-dimensional  picture  of  the  structural 
formula. 

(c)  DELX  -  the  width  of  the  MATRIX. 
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Output  from  CLEANM  is  the  following: 


(*)  Partial  CONNECTION  TABLE  (i.e.,  expanded  Ph  and  -(O^-) 

(b)  ABNORMALITY  TABLE  -  this  table  Is  a  series  of  words,  where 

each  word  gives  information  about  one  atom  which  has  abnormal 

mass  or  valence  or  has  a  charge  on  it. 

i.e. 

Bits  Contents 

(S,l,2)  Type  of  abnormality 

101*charge 

I10«*mass 

lll*valence 


(3-17) 


Atom  number 


(18-35)  Value  of  abnormal  mass,  abnormal 

valence,  or  signed  charge.  The 
sign  of  a  signed  charge  is  indicated 
by  bit  18. 


A  word  of  ?eros  follows  the  lost  abnormality  word. 

(c)  The  MATRIX  "cleaned"  for  processing  by  MAKECT  and  containing 
only  nodes  and  bonds. 


CLEAN M 


r 


~1 


n  1  for  single  letter  atom 


«  2  " 

double 

«  3  " 

H 

»  4  " 

Ph 

*  5  " 

-  (C)  ■ 

•  6  " 

- 

_ - _ J 


Figure  25.  Macro  Flow  Chart  ~  CLEANM 


Figure.  25.  Macro  Flow  Chart  -  CLEANM  (continued) 
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2.2.12 


Generation  Of  The  Connection  Table 


Code  Name :  MAKECT 


Programmer:  Bruce  Hack 


John  Powers 


Abstract;  This  program  assumes  a  MATRIX  of  nodes  and  connecting 
lines.  A  list  is  generated  for  each  node  indicating  the  type  of  node  (element 
type),  all  associated  points,  and  the  connecting  line  types  (bonds).  This  pro¬ 
gram  also  indicates  if  the  atom  is  to  multiplied  in  order  to  be  correctly  compared 
with  the  molecular  formula  during  chemical  verification. 


2.2.12.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  provided  in  Figure  26. 

The  SCRUB  list  contains  each  atom,  bond,  or  other  typea  symbol  in  the  struc¬ 
tural  formula  and  its  relative  MATRIX  location.  All  atoms  have  been  indicated 
by  a  previous  program  (CLEANM)  (i.e.,  capital  letters  in  the  SCRUB  list  are  now 
minus).  The  MATRIX  now  contains  only  atoms  and  bonds.  The  program  proceeds  by 
first  finding  a  minus  sign  in  SCRUB  and  then  calculating  the  position  of  this 
in  core(i. e.,  the  absolute  MATRIX  location). 

Search  is  then  made  in  the  eight  locations  around  the  atom  until  a  bond 
is  found.  The  bond  type  is  noted  and  the  bond  is  followed  until  a  change  in  the 
symbol  type  occurs.  This  new  character  is  classified  as  one  of  the  following 
cases  and  the  appropriate  action  is  taken: 

(a)  Another  atom  -  place  the  proper  entry  in  the  Connection 
Table. 


(b)  A  blank  -  check  to  see  if  it  is  a  bond  corner.  If  so,  con¬ 
tinue  path;  if  not  assume  that  an  unknown  attachment  has 
been  found  and  place  an  entry  in  the  Connection  Table. 

(c)  A  reduced  carbon  chain  -  make  a  Connection  Table  entry  for 
a  carbon  at  this  location. 

(d)  A  nydrogen  atom  -  if  it  has  been  assigned  a  Connection  Table 
number  it  is  entered  into  the  Connection  Table.  Otherwise, 
it  is  ignored. 

A  check  is  made  to  see  if  the  search  around  the  initial  atom  is  complete, 
and  that  all  bonds  have  been  followed.  If  so,  the  next  atom  is  located  in  the 
SCRUB  list  and  the  process  continues.  If  not,  the  next  bond  is  found,  and  the 
bond  is  followed  as  above.  In  the  case  of  a  carbon  chain  a  check  is  made  to  see 
that  the  bond  attachments  are  unambiguous.  At  the  completion,  all  multiplier 
pointers  are  inserted  in  the  Connection  Table. 


2.2,12.2  Program  Structure 

MAKECT  is  a  main  program  that  takes  as  input  the  following: 

SCRUB  "  list  of  all  cyped  characters  in  the  structural  formula  and 
their  relative  MATRIX  locations. 

MATRIX  -  now  containing  only  nodes  and  bonds. 

AXTMOD  -  the  absolute  address  of  the  first  character  of  Che  SFI  in  che 
MATRIX  is  stored  in  the  address  of  this  location  by  ORGNZR. 

MAKECT  rejects  chemical  records  for  the  following  reasons: 

(a)  Error  in  typing  structural  formula 

(b)  Illegal  symbol  in  structural  formula 

(c)  Bond  in  wrong  place 

(d)  Typed  symbols  are  toe  close  for  unambiguous  analysis 

(e)  Non-straight  attachments  to  carbon  chains 


Output  from  MAKECT  is  CT-  the  internal  Connection  Table  whose  entries  are 
formatted  as  follows: 


3  "** - 17 

18-20 

21-23  24 -*-**29 

30  -*-*‘35 

Humber  of  atom  bonded  tc 

Mul- 
ti- 
pli-« 
_&r _ 

Bond 

type 

2nd  lettei 
of  atom 

1st  letter 
of  atom 

The  multiplier  points  to  an  entry  in  the  list  of  bracket  multipliers  (MULTAB, 
described  in  Section  2.2,7. 2)  where  applicable.  Each  atom  has  8  such  entries 
only  the  first  of  which  contains  the  atom  name.  The  second  and  third  contain 
the  relative  matrix  location  of  this  atom  as  follows: 


c 

SAME  AS  ABOVE 

uL 

0000 

Bond  type 


^Word  2  for 
a  given  aton 
3  Low  order 
digits  of 
relative  matrix  location 


3  — - *-17  18-20  21-23: 

24 - *-29  30  -  *-35 

1 

u 

SAME  AS  ABOVE 

jL 

000000 

r 

Bond  type 


,Word  3  for  a 
given  atom: 

2  high  order 
digits  of 
relative  matrix 
location. 


97 


/ 


Figure  26.  Macro  Flow  Chart 


MAKECT 


I 

!  ATOM 


“I 

I 

f“ 

I 

J 


r 

I  BLANK 
I 


I - 

}  REDUCED 
I  C- CHAIN 


|  ATOM  I 

I _ I 


Write  C.T. 
for  unknown 
locus  of 
attachment 


, — 1 

‘-A 

Calculate 

Reduced  \ 

C.T.  number 

carbon  )  " » 

to  which  atom 

chain  / 

Is  to  be 

I _ 

/ 

connected 

Figure  26.  Macro  Flow  Chart  -  MAKECT  (continued) 
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.2.13 


Calling  Program  For  Chemical  Verification 


Code  Name:  PHASE5 

Programmer :  Bruce  Hack 

Abstract:  This  Is  the  call  program  for  VERIFY. 


.2.13.1  Program  Description 

If  compound  Is  found  to  be  correct  by  verification,  this  program  transfers 
o  NFCF.  Otherwise,  an  error  exit  is  taken  and  control  Is  transferred  to  REJECT. 

.2.13.2  Program  Structure 

PHASE5  is  a  program  that  serves  as  a  switch. 


2. '  Cftgftical  Verification 
Code  Hama:  VERIFY 

Programmer :  Helen  Hill 


Abstract:  VERIFY  checks  the  chemical  consistency  of  the  structural 

formula,  molecular  formula,  and  connection  table,  and  verifies  the  valence  of 
each  element  in  the  connection  table  and  In  the  abnormality  table. 

2.2.14.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Pigire  27. 

VERIFY  utilizes  a  table  (Table  A)  as  a  table  containing  all  elements  and 
the  acceptable  valences  for  each  element.  This  table  indicates  whether  an 
element  is  in  odd  or  even  group  of  the  periodic  table.  A  portion  of  the  table 
is  used  as  a  counter  for  storing  element  totals  during  processing.  VERIFY  pro¬ 
ceeds  as  follows: 

VERIFY  first  takes  each  atom  in  connection  table  and  totals  all  the 
explicit  bonds.  It  then  looks  in  the  abnormality  table  to  determine  if  an  ab¬ 
normality  exists  for  this  atom.  If  a  valence  abnormality  is  present,  VERIFY 
uses  the  abnormal  valence  as  the  valence  for  this  atom  and  checks  to  see  if  It 
is  a  legitimate  valence  for  this  element.  If  no  valence  abnormality  ia  present, 
the  minimum  valence  for  this  element  (which  is  found  in  bits  15-23  of  Table  A) 
la  used. 

If  a  charge  abnormality  exists,  the  charge  is  subtracted  from  the  hydrogen 
count.  The  hydrogen  count  for  a  molecular  structure  is  equal  tol (all  explicit 
hydrogens  plus  (valence  of  each  atom  minus  total  explicit  attachments  to  this  '< 
atom  minus  total  charges  In  abnormality  table  for  this  atom  minus  total  unknown 
attachments  from  this  atom)). 

The  program  translates  the  atom  type  from  Dura  Code  to  BC9.  It  then  look£ 
for  this  atom  type  in  Table  A  and  adds  one  to  the  tabulation  in  Table  A  of  the 
total  number  of  occurrences  of  this  element  in  the  connection  table.  Hydrogens 
are  accumulated  In  HCTR  rather  then  in  Table  A.  If  any  hydrogens  are  found  in 
Table  A,  it  Indicates  the  illegal  presence  of  hydrogen  in  the  connection  table 
and  the  Chemical  record  Is  rejected.  If  there  is  a  multiplier  for  the  atom 
being  processed,  it  is  U3ed  to  increase  the  atom  count  of  this  element  by  the 
actual  number  of  occurrences. 

When  all  entries  in  the  Connection  Table  have  been  processed,  the  program 
checks  to  see  if  ION  is  set,  indicating  the  presence  of  atoms  not  included  In  CT. 
If  ION  is  set,  the  program  then  uses  subroutine  IONIC  to  add  these  atoms  which 
are  formatted  in  XTAB,  to  the  totals  in  Table  A. 

Next  VERIFY  totals  the  number  of  atoms  of  each  element  present  in  the 
Hill  molform  and  places  them  in  Table  A.  C,  H,  N  and  0  are  totaled  in  MFC,  MFH, 
MFN,  and  MFO,  Multipliers  of  the  Hill  molform  found  in  case  of  a  Hydrate  are 
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applied  In  totaling  the  Hill  parent,  but  the  water  la  ignored.  If  the  Kill 
parent  multiplier  la  a  fraction  it  results  in  rejection  of  the  compound. 

The  program  then  compares  the  Hill  roolforts  totals  with  the  CT  plus  XTAB 
totals  and  adds  up  the  elements  in  the  odd  group  to  be  used  in  the  H  parity  count. 

If  ADDEND  is  found  to  be  set,  VERIFY  totals  the  addend  molform  atoms  using 
any  multipliers  present  and  compares  the  total  count  for  each  element  in  the 
addend  molform  with  the  totals  in  the  Hill  molform. 

VERIFY  next  totals  the  minus  and  plus  charges  found  attached  to  elements 
in  the  Connection  Table,  and  compares  the  totals  to  see  that  plus  charges  equal 
minus  charges. 

Finslly  VERIFY  performs  the  Hydrogen  Parity  test  on  the  Hill  molform. 

Error  conditions  which  result  in  rejection  of  the  compound  are  the  following: 

(1)  An  Illegal  element  was  found  in  a  molecular  formula, 

(2)  An  illegal  element  was  found  in  Connection  Table. 

(3)  An  element  in  the  Connection  Table  has  high  valence  which  is 
incorrect  for  this  element. 

(4)  The  molecular  formula  contains  a  fraction. 

(5)  Addends  are  present  but  the  first  multiplier  is  aero. 

(6)  Hydrogen  is  present  in  the  Connection  Table, 

(7)  The  assumed  hydrogen  count  differs  from  the  molform  hydrogen 
count . 

(8)  There  was  a  hydrogen  parity  check  error. 

(9)  Connection  Table  C,H,N,  or  0  count  different  from  the  molform 
C,H,N,  or  0  count. 

(10)  An  illegal  valence  was  found  in  the  Connection  Table, 

(11)  The  C,H,N,  or  0  count  In  Hill  molform  differa  from  that  in  the 
addend  molform. 

(12)  Totals  for  elements  other  than  C,H,N,  or  0  are  not  the  same  in 
the  Hill  and  addend  molforms. 

(13)  An  illegal  element  was  found  outside  of  brackets. 

(14)  Connection  Table  total  for  C,H,N,  or  0  is  not  equal  to  the  Hill 
molform  total  for  the  same  element  , 

(15)  The  multiplier  of  the  Hill  molform  in  a  hydrate  is  a  fraction. 
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(16) 


Plus  and  minus  charges  do  not  balance. 


2.2.14.2  Program  Structure 

VERIFY  is  a  subroutine  which  occupies  983  core  locations.  Table  A  of 
elements  and  valences  is  102  locations  long.  In  addition  there  is  a  table  which 
relates  modified  Dura  Mach  code  to  BCD. 

VERIFY  uses  the  following  input: 

CONNECTION  TABLE  described  in  Section  2.2.12.2. 

ABNORMALITY  TABLE  described  in  Section  2 .2 . 11 . 2 . 


PL -  total  plus  charges  outside  of  CT 

MIN  - —  total  minus  charges  outside  of  CT 

XPOINT . pointer  to  end  of  XTAJB 

ION - set  if  atoms  exist  outside  of  CT 

MULT  - - -  pointer  set  if  multipliers  exist 

MULTAB  ---- -  table  of  multipliers  from  REGRUP  described  in  Section 

2.2.7. 2 

XTAB  ----------  table  of  atoms  outside  CT  from  EXCESS  described  in 

Section  2. 2. 8. 2. 

MOLTAB  -  formatted  molform  from  MOLFRM  described  in  Section 

2. 2. 4. 2. 


On  output,  VERIFY  sets  bits  in  the  accumulator  to  indicate  the  type  of  error 
round  if  the  compound  was  rejected. 


VERIFY 


Figure  27.  Macro  Flow  Chart  -  VERIFY 
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2.2.15  Expansion  Of  The  Connection  Table 


Code  Name:  NFCF 

Programmer:  Bruce  Hack 

Nick  Homer 

Abstract:  This  program  expands  the  connection  table  from  the  in¬ 

ternal  format  to  the  format  acceptable  by  program  CONVRT  and  will  print  the 
connection  table  and  abnormality  table  if  a  switch  is  set. 

2.2.15.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  28. 

The  connection  table  list  is  broken  down  into  three  lists: 

(1)  The  E  list  -  this  contains  the  atom  name. 

(2)  The  B  list  -  this  contains  the  bond  type. 

(3)  The  X  list  -  this  contains  the  number  of  the  connected  atoms. 

The  format  of  these  lists  is  described  in  Section  2. 1.2.2. 

If  a  switch  is  set  the  lists  are  printed  by  the  line  printer  under  appro¬ 
priate  headings  and,  the  abnormality  table  is  decoded  and  printed.  The  program 
then  calls  CONVRT  and  TICKER. 

2.2.15.2  Program  Structure 

NFCF  is  a  main  program  which  takes  the  connection  table  described  in  Sec¬ 
tion  2.2.12.2  as  input  and  provides  as  output  three  expanded  lists: 

X  -  The  connection  list 

B  -  The  bond  list 

E  The  atom  name  list 

NFCF  rejects  chemical  records  for  the  following  reasons: 

(a)  Empty  connection  table 

(b)  Incorract  element  symbol  in  bite  24-35  of  CT 

NFCF  calls  DECK  A,  a  subroutine,  which  prints  Connection  Table  titles 
when  a  switch  is  set. 
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2 .2. lf>  Output  Of  Chemical  Record 


Code  Name:  TICKER 

Programmer;  Helen  Hill 

Abstract:  TICKER  writes  an  output  tape  containing  the  TID,  clas¬ 

sification  and  stereo  information,  molform,  nomenclature  and  references,  struc¬ 
tural  formula  image,  connection  table,  and  abnormality  table. 

2.2.16.1  Program  Description 


A  macro  flow  chart  describing  this  program  is  presented  in  Figure  29. 

TICKER  uses  a  portion  of  the  MATRIX  into  which  to  transmit  all  the  informa¬ 
tion  required  for  the  output  and  writes  this  block  onto  magnetic  tape,  writing 
one  record  per  chemical  compound.  It  uses  information  provided  by  MOLFRM,  VERIFY, 
and  CONVRT  to  calculate  the  number  of  rings  in  the  compound  and  stores  this  in 
the  output  record.  It  then  writes  the  total  number  of  parity  errors  encountered 
since  the  last  compound  if  the  input  was  from  a  Mergenthaler  typewriter.  If 
switch  2  is  set,  TICKER  calls  PIX  or  DURPIX  to  produce  pictures. 

2.2.16.2  Program  Structure 

TICKER  is  a  subroutine  which  requires  142  core  locations  and  utilizes  the 
MATRIX  area  as  an  output  buffer. 

It  requires  the  following  input: 


MOLTAB  - 
N0MTA8  - 
REGNO  - 
CLSTER  - 


CONTOT  - 
UNDTAB  - 
SCRUB  - 
CELLB  - 
ATIR  - 
ASCRUB  - 
ADTOT  - 
DELX  - 


the  molecular  formula  described  in  Section  2 .2 .4 , 2 
the  nomenclature  described  in  Section  2.2. 5. 2 
the  registry  number 

formatted  classification  and  stereo  information  described 
in  Section  2. 2. 3. 2 

the  number  words  in  the  connection  table 
the  underline  table  described  in  Section  2 , 2 . 3 .2 
the  SFI  described  in  Section  2. 2. 3. 2 
connection  table  described  in  Section  2.1.2 
the  number  of  words  in  the  abnormality  table 
a  pointer  to  end  of  the  scrub  list 
number  of  addend  fragments  in  the  compound 
x  size  of  matrix 
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DELY  -  y  size  of  matrix 

RINGS  -  used  to.  calculate  total  rings 


The  output  from  TICKER  is  the  formatted  chemical  record  described  in 
Figure  13,  This  is  on  magnetic  tape,  one  chemical  record  per  physical  record. 


2.2.17  Rejection  Of  Incorrect  Records 


Code  Name;  REJECT 
Programmer :  John  Powers 
Bruce  Hack 

Abstract:  This  program  la  transferred  to  from  various  portions  of 

the  system.  A  message  is  printed  out  and  the  program  transfers  to  AEND. 

2.2.17.1  Program  Description 

The  following  macro  flow  chart  describes  this  program. 


NOTE:  AEND  transfers  the  program  from  the  generalized  processing 
portion  of  the  system  to  one  of  the  input  programs  for  the 
processing  of  the  next  chemical  record. 


2.2,17.2  Program  Structure 

REJECT  is  a  subroutine. 
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2.2,18  CHEMTYPE  To  CIDS  Format  Conversion 


Code  Name:  UPTAP 
Programmer:  Ed  Hebei 

Abstract:  This  program  reformats  tne  output  of  tne  CHEMTYPE  system  into 
CIDS  format  and  merges  into  the  record  descriptors  which  were  introduced  through 
punched  cards. 


2.2.18.1  Program  Description 

UPTAP  passes  through  the  following  phases:  sort,  format,  card  read,  print 
TXD's  (temporary  identification),  and  sort. 

The  first  phase  of  UPTAP  sorts  on  the  two  word  field  TID  for  each  logical 
record  created  by  the  CHEMTYPE  programs.  At  the  6tart  of  the  program  a  check 
is  made  to  see  if  any  record  read  exceeds  1,000  words.  Should  a  record  exceed 
1000  words,  it  is  skipped  and  its  TID  is  printed. 

The  next  phase  inserts  one  to  four  words  from  the  Molform  Table  at  the  be¬ 
ginning  of  each  record.  Word  five  is  the  base  address  for  all  relative  addresses 
to  be  calculated.  The  actual  word  count  of  the  entire  CIDS  record  with  the 
exception  of  tne  four  MF  words,  and  the  twc's  complement  relative  addresses  in 
the  record  for  Additional  Registry  Numbers,  Abnormality  Table,  Compound  Con¬ 
nection  Table,  Reference  Block,  Structural  Formula  Image,  Keys,  and  Qualifiers 
are  calculated  and  placed  in  words  5,6, 7;8.  (See  Section  2. 1.3.2). 

Within  the  Reference  Block,  a  header  and  a  table  of  ecnrer.cs  is  created. 

The  decrement  of  the  header  holds  the  count  for  the  table  of  content  words. 

The  address  portion  contains  the  number  of  words  in  the  Reference  Block  includ¬ 
ing  the  herder.  The  decrement  in  each  table  of  contents  word  contains  tne  i- 
dentif ication  number  for  the  type  of  information.  In  this  portion  of  the  Refer¬ 
ence  Block,  Bits  18-20  indicate  the  type  of  data:  BCD,  BINARY,  Modified  DURA, 
or  Compressed  Modified  Dura.  The  address  portion  of  each  word  has  the  address 
relative  to  the  header  of  the  Reference  Block.  The  table  of  contents  shown  in 
Section  2. 1.3. 2  allows  the  Reference  Block  to  be  expanded  without  difficulty  in  th< 
future . 

The  card  routine  reads  a  card  and  compares  its  TID  against  the  TID  of  the 
current  record.  If  the  record  TID  proves  to  be  larger  than  the  caca  TID  an  error 
message  is  generated  stating  that  fact.  The  next  card  is  read  and  the  same  test 
applied . 

If  more  than  one  card  is  present  with  the  same  TID  a  check  is  made  to  ascertain 
that  the  cards  are  in  contiguous  sequence.  If  they  are  not,  an  error  message  is 
generated.  Whenever  a  card  is  encountered  with  a  new  TID,  the  descriptor  on  the 
card  for  the  previous  TID  will  be  transferred  into  the  current  record.  Prior  to 
transfer,  a  check  is  made  to  see  whether  the  new  TID  on  card  is  greater  than  the 
TID  of  the  descriptor  being  stored.  Should  the  r.ew  TID  prove  to  be  less,  an  error 
message  results. 

The  TID  of  all  recoids  processed  are  saved  on  an  output  tape  called  'T1DNP. '. 


After  formatting  the  CIDS  records,  UPTAP  prints  the  number  ot  input  and 
output  records  processed. 

In  the  final  phase  of  the  program  a  sort  is  performed  on  two  fields: 

1)  the  four  Molform  Table  words,  and,  2)  the  two  word  TTD.  The  sorted  records 
arc  written  on  the  final  output  tape  for  input  to  the  registry  system. 


REFERENCE  BLOCK  FORMAT 


WORD  S _ 2  3  _ 17  18  20  21 _ _ _ 35 


0 

No,  of  words  in 
table  of  concents 

HEADER 

No.  of  words  in  reference 
block  (including  Word  0) 

CLSTER  ** 

+ 

1 

1 

1* 

RA  to  Header  of  Ref  Blk 

Nomenclature  ** 

+ 

2 

2 

3* 

RA  to  Header  of  Ref  Blk 

3 

0* 

RA  to  Header  of  Ref  Blk 

A 

■  T  '  - 

CLSTER 

5 

» 

Nomenclature 

• 

X 

EA  Number  (S) 

NOTE: 


*  Type  of  Data 

0  BCD 

1  Binary 

2  Modified  Dura 

3  Compressed  Modified  Dura 

**  J.f  Decrenent  is  zero 
no  data  is  stored 


18.2  Program  Structure 


l.'PTAP  is  a  main  program. 


figure  30,  Macro  Fiow  Chart  -  I'FTAP 
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REGISTRATION 


The  CIDS  Registry  System  examines  compound  records  which  are  candidates 
for  the  file  in  order  to  determine  which  compounds  are  duplicates  of  those  al¬ 
ready  in  the  file.  In  order  to  locate  duplicates,  a  Master  Registry  File  is 
maintained  in  molecular  formula  sequence.  Potential  new  compounds  are  sorted 
in  this  same  order  and  compared  against  the  Master  Registry  File.  An  atom-by¬ 
atom  search  is  performed  to  compare  the  structure  of  each  potential  registrant 
with  each  of  its  isomers  (compounds  with  identical  molecular  formulas)  in  the 
file.  If  a  connection  table  match  is  found,  a  further  test  is  made  to  see  if 
the  complete  records  are  exact  duplicates. 

Figure  31  shows  the  general  flow  of  the  registration  process.  The  four 
principal  programs  STARTA,  HLDPRC,  REGUD,  and  RUD  II  are  described  in  the  fol¬ 
lowing  sections.  The  principal  files  involved  are  the  Master  Registry  file, 
the  Print  file,  a  d  the  Structure  file.  All  utilize  the  CIDS  record  format  as 
described  in  Section  2. 1.3. 2,  but  the  blocks  of  data  actually  stored  differs 
between  files. 

The  Master  Registry  file  contains: 

Registry  number 

Additional  compound  identification  numbers 
Molecular  formula 

Connection  table  and  abnormality  table 
Reference  block 

The  items  in  the  reference  block  are  listed  in  Section  1.1. 

The  Print  file  is  an  auxiliary  file  to  the  search  system.  It  is  main¬ 
tained  separately  because  the  data  it  contains  is  not  searched,  but  merely  ac¬ 
cessed  for  printing  after  the  answers  to  a  query  have  been  determined.  Thus 
the  data  it  contains  does  not  have  to  be  maintained  on  the  same  high  speed  stor 
age  devices  as  the  remainder  of  the  search  file.  An  additional  reason  for  keep 
ing  this  data  separate  is  that  it  is  likely  to  be  updated, while  the  rest  of  the 
search  data  remains  static.  This  separation  considerably  shortens  the  update 
process.  The  items  contained  in  each  record  on  the  Print  file  are: 

Registry  number 

Additional  compound  identification  numbers 
Molecular  formula 
Structural  formula  image 
Reference  block 

The  Structure  file  is  the  input  to  the  key  assignment  programs  described 
.in  Section  2. A  and  illustrated  in  Figure  3.  The  data  blocks  contained  in  this 
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file  are: 


Registry  number 

Molecular  formula 

Connection  table  and  abnormality  table 

Structural  Keys, 

The  structural  keys  will  be  removed  from  the  canpound  records  before  they 
are  entered  into  the  search  file. 

Potential  registrants  to  the  file  are  processed  in  one  of  the  following 
ways  by  program  STARTA: 

(1)  Those  which  have  no  connection  table  matches  in  the  file 
are  immediately  registered. 

(2)  Those  whose  complete  records  exactly  match  some  compound  re¬ 
cord  on  the  Master  Registry  File  are  discarded. 

(3)  Those  whose  connection  table  matches  one  or  more  file  com¬ 
pound  connection  tables,  but  do  not  have  a  complete  record 
match  are  saved  for  further  inspection  by  a  chemist. 

Those  compounds  which  have  initially  been  determined  to  be  unique  by  pro¬ 
gram  STARTA,  can  be  immediately  prepared  for  entry  into  the  file  by  program 
REGUD  which  adds  Print  information  from  these  records  to  the  Print  tape  and 
outputs  a  structure  tape  for  input  to  the  functional  group  key  assignment  programs. 

Those  compounds  in  category  (3)  above  are  written  on  a  Hold  tape  by  STARTA 
and  must  be  examined  by  a  chemist  before  further  action  can  be  taken.  He  must 
determine  for  each  compound  match  whether  the  two  are  different  compounds  (in 
which  case  they  must  be  stereoisomers),  or  whether  they  are  the  same  compound, 
but  have  some  different  data  in  the  compound  record.  This  difference  may  oc¬ 
cur  because  of  an  error  ir.  one  of  the  two  records,  or  one  may  contain  additional 
data  which  the  other  does  not . 

Program  HLDPRC  processes  those  compounds  on  the  Hold  tape  based  on  the  de¬ 
cisions  of  the  chemist.  These  compounds  are  processed  in  one  of  the  following 
ways : 


(1)  Those  that  are  different  (stereoisomers;  of  all  their  isomers 
in  the  file  are  registered  as  new  compounds. 

(2)  Those  that  are  the  same  compound  are 

(a)  Ignored  if  the  data  already  in  the  file  is  more  complcre 
and  more  correct  than  that  on  the  Hold  tape. 

(b)  Selected  parts  of  the  data  record  are  used  to  update  the 
file  record  if  the  data  on  the  Hold  tape  is  more  complete 
or  more  correct. 
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2.3.1  Master  Registry  Program 


Code  Name:  STARTA 

Programmer :  Donald  Headley 

Abstract :  STARTA  determines  which  of  a  group  of  potential  new  compounds 

are  different  from  those  already  registered  in  the  master  file.  These  com¬ 
pounds  are  registered,  positive  matches  are  discarded,  and  questionable  matches 
are  printed  for  further  examination  by  a  chemist . 


2. 3. 1.1  Program  Description 

STARTA  reads  a  tape  of  potential  registrants  which  was  produced  by  program 
ITPTAP  (Section  2.2.17).  These  are  sorted  according  to  the  4-word  molecular 
formula  sort  key  which  precedes  the  normal  CIDS  record  This  tape  is  then  com¬ 
pared  against  the  Master  Registry  File  which  is  also  in  MF  sequence.  Program 
STKUC  (Section  2.4.8)  is  called  to  determine  if  two  connection  tables  match  when¬ 
ever  a  molecular  formula  match  is  found. 

If  the  connection  table  of  the  candidate  record  does  not  match  any  records 
on  the  Registry  Master  File,  a  unique  registry  number  is  assigned  to  the  com¬ 
pound,  and  it  is  written  on  the  new  Registry  Master  and  on  the  new  compound  file 
NREGC,  This  tape  will  then  be  processed  by  program  REGUD. 

If  the  connection  table  of  the  candidate  compound  matches  that  of  one  or 
more  file  compounds,  a  further  test  is  made  to  determine  if  the  entire  record 
is  the  same.  This  means  that  the  temporary  identification  number  (TlD)  of  the 
candidate  record  must  match  one  of  the  additional  registry  numbers  of  the  file 
compound.  As  a  further  check,  the  stereo  indicator  and  nomenclature  must  ex¬ 
actly  match.  If  these  data  fields  are  all  identical,  the  candidate  compound 
is  considered  a  duplicate  and  is  discarded. 

If  there  was  a  connection  table  match  but  one  of  the  other  fields  failed 
to  match,  then  that  data  from  the  matching  compound  records  must  be  printed 
for  examination  by  a  chemist.  The  compound  record  for  a  potential  registrant 
which  falls  in  this  category  is  stored  on  the  Hold  tape  for  later  processing 
by  program  HLDPRC  (Section  2.3.2).  A  card  is  punched  containing  the  TID  of 
each  compound  on  the  Hold  tape.  After  a  decision  is  made  by  a  chemist,  an  ac¬ 
tion  code  must  be  punched  on  each  of  these  cards  indicating  the  type  of  process¬ 
ing  to  be  performed  on  each. 


2. 3. 1.2  Program  Structure 

STARTA  is  a  main  program  which  requires  subroutine  STRUC.  The  inputs  for 
the  program  are: 

(1)  1 INPUT  1 '  :  Potential  new  compounds  from  program  UPTAP. 

(2)  'MAST2A':  The  old  Registry  Master  File.  For  the  initial 
run,  a  parameter  card  specifies  that  the  file  is  not  present. 
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(3)  A  card  with  the  next  registry  number  to  bi  assignor!  in 
columns  25-36. 

The  outputs  of  the  program  arc: 

(1)  'MAST2Bf:  The  new  Registry  Master  File 

(2)  'HOLDTP'i  Hold  Tape  File.  The  potential  registrants  from 
'INPUTl'  that  matched  a  record  (s)  in  the  Registry  Master  File 

(3)  'NEWCMP' :  The  new  registered  compounds. 

(4)  A  listing  on  the  printer  of  the  records  on  the  Hold  tape 
with  the  matching  records  from  the  Registry  Master. 

(5)  A  punched  card  for  each  record  on  the  Hold  tape  containing 
TID  number  and  a  card  sequence  number. 

(6)  A  card  with  the  next,  registry  number  to  be  assigned  in 
columns  25-36. 

All  files  have  IOBS  type  2  format,  with  a  maximum  block  size  of  1000  won's . 


2, 3. 1.3  Operating  Instructions 

For  execution,  tapes  must  be  mounted  as  follows: 


'  INFUT1 1 

S.SU07 

(B6) 

'MAST2A' 

S.SU04 

(C4) 

'MAST2B ' 

S.SU05 

<B5) 

'HOLDTP' 

s .  sur 

B4) 

'NEWCMP' 

S.Sv  oO 

(B3) 

Scratch 

S.SU02 

(C2) 

II 

S.SU01 

(C3) 

F or  multi-reel  operation,  a  message  will  print  at  the  end  of  each  input  tape. 
Sense  switch  2  must  be  set  to  signal  the  last  input  reel  for  file  'INPUT]'. 
Sense  switch  3  must  be  set  to  signal  the  last  reel  for  file  'MAST2A1 . 
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2-3-2  Hold  Tape  Processor 


Code  Name :  HLDPRC 

Programmer :  David  Sherr 

Abstract t  HLDPRC  processes  the  Hold  tape  produced  by  program  STAKTA 

according  to  an  action  code  punched  on  each  card  of  the  TID  card  deck  pro¬ 
duced  by  STARTA.  The  action  codes  are  the  results  of  a  chemist's  decision 
to  register,  ignore,  or  update  the  compound  record  for  each  compound  on  the 
Hold  tape . 


2. 3. 2.1  Program  Description 

As  each  compound  record  is  read  from  the  Hold  tape,  the  next  card  In 
the  card  input  is  checked  to  see  if  it  contains  the  same  TID,  If  it  does, 
the  compound  record  is  processed  according  to  the  action  code  punched  on 
the  card.  The  present  allowable  action  codes  are: 

(1)  Ignore  record 

(2)  Register  as  new  compound 

(3)  Replace  nomenclature. 

Action  code  3  is  presently  the  only  type  of  update  available.  In  this  case 
the  registry  number  of  the  compound  whose  nomenclature  is  being  replaced  is 
also  punched  on  the  TID  cards.  Other  types  of  updates  are  in  the  process  of 
being  implemented. 

Compounds  that  are  to  be  registered  are  assigned  the  next  available  reg¬ 
istry  number  and  the  record  is  written  in  sequence  on  the  new  Registry  Master 
and  on  'NREGC1,  the  tape  of  new  compounds  to  be  input  to  RUD  II.  Updates  are 
also  written  on  'NREGC'  after  the  compound  record  on  the  new  Registry  Master 
is  updated  . 

A  macro  flow  chart  of  the  program  is  presented  in  Figure  32, 


2. 3. 2. 2  Program  Structure 

HLDPRC  is  a  main  program  which  requires  as  input: 

(1)  'HOLD':  The  Hold  tape 

(2)  'OLMAS':  The  old  Registry  Master  File 

(3)  A  card  with  the  next  registry  number  to  be  assigned  in 
columns  25-36 

(A)  The  TID  card  deck  from  STARTA  with  action  codes  punched  in 
columns  13-18  right -justified. 
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Figure  32.  Macro  Flow  Chart  -  HLDPRC 


The  outputs  produced  are: 


(1)  'NWMAS':  The  new  Registry  Master  File 

(2)  'INTER':  An  intermediate  tape  for  RUD  II  containing 
new  compounds  and  updates  . 

(3)  A  card  with  the  next  registry  number  to  be  assigned  in 
columns  25-36. 

All  files  contain  IOBS  type  2  records  in  CIDS  record  format . 


2. 3. 2. 3  Operating  Instructions 

For  execution,  tapes  must  be  mounted  as  follows: 

'HOLD'  S.SU05 

'OLMAS'  S.SU04 

'NWMAS'  S.SU07 

'INTER'  S.SU06 

When  the  last  reel  of  the  'OLMAS'  file  is  mounted,  sense  switch  5  must  be  set. 
Similarly,  sense  switch  4  must  be  set  when  the  last  'HOLD'  reel  is  mounted. 

The  first  input  card  must  have  the  next  registry  number  punched  in  columns 
25-36  and  must  be  followed  by  the  action  cards  (output  from  STARTA) .  An  end 
of  data  card  with  END  punched  in  columns  1-3  must  immediately  follow  these 
cards.  The  action  cards  are  sequenced  and  must  remain  in  that  order. 
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Code  Name:  RE CUD 


Programmer:  David  Sherr 

Abstract:  REGUD  updates  the  Print  tape  by  adding  new  records  for  a  group 
of  newly  registered  compounds. 


2.3.3. 1  Program  Description 

REGUD  reads  a  tape  of  newly  registered  compounds.  These  records  are  then 
sorted  by  registry  number  in  order  to  add  new  records  to  the  Print  tape  which 
is  in  registry  number  sequence.  The  program  reads  the  last  old  Print  tape 
and  checks  that  the  last  registry  number  is  smaller  than  the  first  of  the  ne 
records.  If  not,  an  error  is  indicated.  A  new  Print  tape  is  written  with  t  ie 
new  Print  records  added  to  the  tape. 


2. 3. 3. 2  Program  Structure 

REGUD  is  a  main  program  which  requires  as  input: 

'OLD':  Old  Print  tape 

'NREGC':  Tape  of  new  registered  compounds 

Data  card  containing  number  of  reels  in  the  ’OLD'  file. 

The  outputs  produced  are: 

’PRNT':  New  Print  tape 
'STRUG':  Structure  tape 

Data  card  containing  number  of  reels  in  the  'PENT'  file 

All  tapes  are  written  in  IOBS  type  2  records  and  the  logical  records  are  in 
the  CIDS  record  format  described  in  Section  2. 1.2. 2. 

The  input  tape  ' NREGC '  was  produced  by  program  STARTA  (Section  2.3.1). 
The  output  tape  'STRUC'  is  the  input  to  the  key  assignment  programs  described 
in  Section  2.4. 


2. 3. 3. 3  Operating  Instructions 

The  last  reel  of  the  'OLD*  file  must  be  mounted  on  S.SU04  with  SS5  set. 

An  input  card  with  the  number  of  reels  in  the  'OLD1  file  must  follow  the  $£NTRY 
card.  The  first  'NREGC'  file  tape  must  be  mounted  on  S.SU05.  Typewriter  mes¬ 
sages  call  for  successive  reels.  When  the  last  'NREGC'  tape  is  mounted,  SS  4 
must  be  set.  When  the  program  ends,  it  prints  record  counts  for  each  file  and 
punches  a  card  with  Che  number  of  reels  in  the  'PRNT1  file. 
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2*3.4  Registry  Print  Tar,e  Update  II 
Code  Name:  RUD  II 
Programmer:  David  Sherr 

Abstract:  RUD  II  updates  the  Print  tape  by  adding4 new  records  for  a  group 
of  newly  registered  compounds  and  updating  records  corresponding  to  previous¬ 
ly  registered  compounds. 


2.3.4. 1  Program  Description 

RUD  II  reads  a  tape  produced  by  program  HLDFRC  (Section  2.3.2)  which  con¬ 
tains  newly  registered  compounds  and  Print  data  for  addition  to  or  replace¬ 
ment  of  Print  records  for  previously  registered  compounds.  These  records  are 
sorted  by  registry  number  for  merger  with  the  Print  tape. 

RUD  II  must  pass  the  entire  old  Print  tape  in  order  to  update  or  replace 
records  which  have  been  changed.  When  the  end  of  the  file  is  read,  records 
corresponding  to  newly  registered  compounds  (which  automatically  have  larger 
registry  numbers)  are  added  to  the  file. 


2. 3. 4. 2  Program  Structure 

RUD  II  requires  the  same  inputs  as  REGUD  (Section  2.3.2),  except  that 
file  1 NREGC * ,  which  is  in  this  case  produced  by  program  HIDPRC  (Section  2.3.1), 
contains  both  newly  registered  compounds  and  updates  for  previously  registered 
compounds.  The  outputs  produced  are  the  same  as  those  by  REGUD. 


2. 3. 4. 3  Operating  Instructions 

The  operating  procedure  is  the  same  as  for  program  REGUD  except  that  all 
reels  of  the  old  Print  tape  must  be  read  and  updated. 
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2.4  KEY  ASSIGNMENT 


The  Key  Assignment  System  is  broken  into  two  subsystems,  (1)  the  ring  key 
assignment  system  and  (2)  the  specific  fragment  and  miscellaneous  key  assignment 
aystem.  The  programs  are  broken  up  into  two  groups  because  of  the  Large  core 
requirements  of  each. 

The  ring  key  assignment  programs  analyze  the  ring  systems  of  a  structure 
from  its  connection  table  and  automatically  assign  the  appropriate  CIDS  generic 
ring  keys  as  described  in  CIDS  No.  4.  In  addition,  if  the  compound  connection 
tables  being  processed  do  not  have  ring  atoms  and  ring  bonds  explicitly  marked, 
then  these  programs  also  perform  this  function.  For  this  class  of  data,  it  is 
necessary  to  perform  ring  analysis  before  assigning  specific  fragment  and  acyc¬ 
lic  keys.  Otherwise,  the  order  of  processing  is  unimportant.  The  programs  which 
make  up  the  ring  key  assignment  system  are  SCNCAS,  SCRNCR,  SCRNDR,  RINGl,  RING2, 
RING3,  and  RING4.  SCRNCR  and  SCRNDR  are  two  versions  of  the  same  program,  the 
first  being  used  for  data  which  previously  had  ring  bonds  and  ring  atoms  expli¬ 
citly  marked  and  the  second  for  data  wnich  does  not  yet  contain  these  indicators. 

The  remainder  of  the  key  assignment  programs  are  grouped  together.  The  pro¬ 
grams  comprising  this  system  are  SCNCAS,  SCREEN,  STRUC,  HCRCT,  BONDCT,  MFSRN, 
and  PSCKYT.  The  executive  program  SCNCAS  is  the  same  for  both  systems.  SCREEN 
is  the  6ub-executive  program  which  serves  a  similar  function  as  SCRNCR  and  SCRNDR 
in  the  ring  key  assignment  system.  Each  of  the  other  subroutines  assigns  some 
particular  type  of  key  to  a  compound.  Program  STRUC  is  called  to  perform  an 
atom-by-atom  search  to  determine  if  a  particular  functional  group  or  hydrocarbon 
radical  fragment  is  present  in  a  compound.  Program  HCRCT  assigns  a  key  when  a 
compound  is  found  to  contain  a  nonspecific  hydrocarbon  radical  as  described  in 
Section  2.4.9.  Program  BONDCT  assigns  the  CIDS  acyclic  nuclei  keys  described  in 
CIDS  No.  4.  Program  MFSRN  assigns  molecular  formula  keys.  Program  PSCKYT  assigns 
non-specific  phosphorus  functional  group  keys. 


2.4.1  Key  Assignment  Executive 


Code  Name:  SCNCAS 

Programmer:  Ruth  V.  Powers 

Abstract:  SCNCAS  is  the  executive  for  the  Key  Assignment  programs.  For 
each  compound  on  the  input  tape,  the  sub-executive  program  is  called  which  in 
turn  calls  the  appropriate  screening  subroutines.  SCNCAS  writes  the  compound 
record  on  tape  in  the  same  format  with  the  newly  assigned  key9  added  to  the  re¬ 
cord.  The  program  has  been  written  so  a3  to  provide  flexibility  in  restarting 
the  screening. 

2.4. 1.1  Program  Description 

SCNCAS  first  calls  a  subroutine  which  reads  the  fragment  screens  (output  of 
SLOAD)  if  it  is  used  with  the  fragment  screening  programs. 

A  data  card  is  then  read  which  provides  the  parameters  for  restarting  the 
program.  The  first  number  of  the  card  gives  the  registry  number  of  the  last 
compound  which  has  already  been  processed  from  the  input  file.  SCNCAS  skips  to 
this  compound  on  the  tape  and  begins  processing  with  the  following  compound. 

If  this  first  number  is  zero,  processing  begins  with  the  first  compound  on  the 
tape. 


The  second  number  on  the  data  card  gives  the  registry  number  of  the  last 
compound  processed  on  the  previous  output  tape  if  it  is  desired  to  add  to  it. 
This  number  is  zero  if  a  new  output  tape  is  to  be  started. 

As  each  compound  record  is  read  from  the  input  tape,  pointers  are  set  to 
the  locations  of  the  molecular  formula  and  connection  tables.  One  of  several 
possible  sub-executive  programs  is  then  called,  depending  on  the  type  of  key 
assignment  to  be  done.  When  control  is  returned  to  SCNCAS,  a  test  is  made  to 
see  if  the  compound  was  successfully  screened.  If  so,  the  keys  are  added  to 
the  record  and  it  is  rewritten  on  the  output  tape.  If  any  type  of  error  is 
encountered,  or  if  the  size  of  the  compound  exceeds  some  program  limitation,  it 
is  written  on  a  Reject  Tape. 

Processing  is  halted  and  all  files  closed  when  either  an  end-cf-file  is 
encountered  on  the  input  tape  or  sense  switch  5  is  pressed  in  at  the  console. 

A  macro  flow  chart  of  the  program  is  presented  in  Figure  33. 

2.4. 1.2  Program  Structure 

SCNCAS  is  the  main  program  of  the  Key  Assignment  system.  It  calls  one  of 
the  subexecutive  programs  SCREEN,  SCRNCR,  or  SCRNDR,  depending  on  the  phase  of 
Key  Assignment  to  be  accomplished. 
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The  input  to  SCNCAS  la  the  tape  of  compounds  in  the  CIDS  record  format. 
This  format  is  described  in  Section  2.1.3.  In  addition,  for  the  assignment  of 
functional  group  and  hydrocarbon  radical  keys,  the  tape  of  screen  fragments  (out¬ 
put  of  SLQAD)  must  be  loaded  into  core.  The  output  of  SCNCAS  is  a  compound  tape 
in  the  same  format  as  the  input  tape,  with  any  newly  assigned  keys  added  to  the 
record. 


SCNCAS  causes  the  following  messages  to  be  typed  for  compounds  that  have 
been  rejected: 

(1)  NO  C.T.  XXXXXXXXXXXX. 

Compound  record  contains  no  connection  table  because  of  some 
error  in  entering  the  compound  in  the  file. 

(2)  XXXXXXXXXXXX  REJECTED.  BONo  TABLE  TOO  LONC . 

Number  of  bonds  in  the  compound  exceeds  a  prot;am  limitation. 


Further  error  messages  are  described  in  subroutine  descriptions. 


2.4. 1.3  Operator  Instructions 


Tapes  for  the  key  assignment  programs  are  mounted  as  follows: 


Compound  Input  Tape  S.SU06 
Fragment  Screens  (if  needed)  S.SU05 
Previous  Output  Tape  (if  needed)  S.SU04 
Output  Tape  S.SU07 
Reject  Tape  S.SU10 


Sense  switch  2  is  pressed  in  to  print  the  keys  that  have  been  assigned. 
(Otherwise  only  the  registry  numbers  of  the  compounds  processed  are  printed.) 
Pressing  sense  switch  5  in  causes  the  program  to  halt  processing  and  close  all 
files.  See  subroutines  for  other  switch  settings. 


!*.4.2  Kev  Assignment  Sub -Executive 


Code  Namea:  SCREEN,  SCRNCR,  SCRNDK 

Pm^ramnipr ;  Ruth  V.  Powers 

Abstract :  This  program  serves  as  part  of  the  Key  Assignment  Executive. 

It  is  a  subroutine  of  program  SCNCAS  and  acts  as  an  intermediary  between  it  and 
the  various  key  assignment  subroutines.  SCREEN,  the  version  used -when  hydrocarbon 
radical  and  functional  group  fragment  keys  are  being  assigned,  selectB  the  par¬ 
ticular  screen  fragments  which  must  be  applied  to  the  compound  being  screened. 

2. 4. 2.1  Program  Description 

Three  versions  exist  for  the  Screen  Assignment  Sub-Executive  program  for 
use  with  different  data  and  different  types  of  key  assignment.  All  three  ore 
called  by  program  SCNCAS  and  serve  the  functions  of  initializing  counts,  print¬ 
ing  the  registry  number  of  the  compound  being  processed,  and  transmitting  to  the 
screening  subroutines  the  address  of  the  compound  connection  table  and  its  length. 

Programs  SCRNCR  and  SCRNDR  are  employed  for  the  assignment  of  the  generic 
cyclic  nuclei  keys.  They  in  turn  call  the  ring  analysis  subroutines  to  assign 
these  keys.  The  only  difference  between  these  two  versions  is  chat  in  SCRNCR 
a  location  CASWCH  is  set  minus,  and  in  SCRNDR  it  is  set  plus.  SCRNDR  is  used 
wnen  the  connection  table  data  being  processed  does  not  yet  have  the  ring  bonds, 
ring  atoms,  and  resonant  bonds  marked.  In  this  case,  the  ring  analysis  programs 
perform  this  function.  SCRNCR  is  used  when  CAS  data  is  being  processed,  in 
which  case  these  marks  have  already  been  recorded.  A  flow  chart  for  these  two 
programs  are  presented  in  Figure  34  . 

Program  SCREEN  is  used  when  hydrocarbon  radical  and  functional  group  frag¬ 
ment  keys  are  being  assigned.  It  serves  the  function  of  selection  of  the  par¬ 
ticular  screen  fragments  which  must  be  applied  to  the  compound  being  screened. 

The  program  selects  the  fragments  and  points  to  each  of  them  in  turn  while 
calling  the  atom-by-atom  search  program  (STRUC)  to  decide  whether  the  fragment 
is  present  in  the  compound. 

If  the  result  of  an  atom-by-atom  search  is  affirmative,  STRDC  is  called 
again  to  determine  if  the  fragment  is  present  in  another  part  of  the  compound. 

When  STRDC  locates  a  fragment  in  the  file  connection  table  (C.T.),  the  atoms 
which  correspond  to  those  in  the  fragment  are  "erased"  from  the  core  buffer  so 
that  they  are  not  available  as  possible  choices  on  the  next  attempt  to  locate 
the  fragment  in  the  compound  C.T.  In  this  way  the  total  number  of  occurrences 
of  the  fragment  can  be  determined.  This  "erasure"  is  carried  through  the  en¬ 
tire  assignment  of  functional  group  keys.  Since  the  order  in  which  the  frag¬ 
ments  will  be  tested  for  assignment  is  from  largest  to  smallest,  this  means 
that  if  one  functional  group  appears  within  a  larger  functional  group,  only 
the  key  corresponding  to  the  larger  one  will  be  assigned. 

The  selection  of  fragments  which  are  potential  keys  for  a  particular  com¬ 
pound  is  based  on  the  molecular  formula  of  the  compound.  The  first  level  of 
discrimination  is  based  on  the  kind  of  elements  present  in  the  compound.  This 
information  is  used  to  select  the  fragment  groups  which  contain  no  elements 
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Figure  34 .  Macro  Flow  Chart  -  SCRNCR,  SCRNDR 
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other  than  those  present  In  the  file  compound.  The  CROSS  table  produced  by 
program  SLQAD  (Section  2.4.3)  is  consulted  to  find  those  fragment  groups  which 
contain  acceptable  elements  for  a  particular  compound. 

After  a  group  has  been  selected,  each  fragment  in  the  group  is  submitted 
to  a  molecular  formula  test  which  requires  that  the  number  of  atoms  of  each 
element  be  less  than  or  equal  to  the  number  of  the  corresponding  element  in  the 
file  compound.  If  this  test  is  passed,  an  ntom-by-atom  search  is  then  performed 
If  the  fragment  passes  this  search,  it  is  assigned  as  a  key  to  the  compound,  and 
the  key  number  which  represents  it  is  stored  in  the  key  section  of  the  compound 
record. 

When  the  last  screen  of  the  candidate  group  is  tested,  SCREEN  returns  tr¬ 
ite  scan  of  the  CROSS  array  to  find  the  next  candidate  group.  When  the  scan 
of  CROSS  has  been  completed,  all  appropriate  fragment  keys  have  been  assigned. 

SCREEN  also  calls  program  HCRCT  (Section  2.4.9)  to  assign  nonspecific  hydro 
carbon  radical  keys,  program  BONDCT  (Section  2.4.10)  to  assign  acyclic  keys, 
program  MFSRN  (Section  2.4.11)  to  assign  keys  based  on  the  Hill  molecular  for¬ 
mula,  and  program  PSCKYT  (Section  2.4.12)  to  assign  nonspecific  phosphorus  func¬ 
tional  group  keys.  Control  is  then  returned  to  SCNCAS. 

Figure  35  presents  a  macro  flow  chart  of  SCREEN. 

2. 4. 2. 2  Program  Structure 

This  program  is  a  subroutine  of  program  SCNCAS  in  the  Key  Assignment 
system.  It  requires  the  following  input  data: 

RINGCT--contains  total  ring  count  of  compound 

TOCNO--2  word  array  containing  the  registry  number  in  BCD 

DOCAD--decrement  contains  addrc.  of  rni-povmd  C.T. 

COUNTS --contains  number  of  words  in  t'.e  compound  C.T. 

The  program  prints  the  registry  number  of  all  compounds  processed.  In 
addition,  whenever  sense  switch  2  is  pressed  in,  the  keys  which  were  assigned 
to  the  compound  are  printed. 
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Figure  35.  Macro  Flow  Chart  -  FCREEN 
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Figure  35.  Macro  Flow  Chart  -  SCREEN  (Continued) 
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2.4.3  Loading  of  Structural  Fragment  Screens 


Code  Name:  SLOAD 
Programmer:  Ruth  V.  Powers 

Abstract:  Program  SLOAD  prepares  structural  fragment  data  for  use  by  the 
screen  assignment  program.  Connection  table  data  defining  the  specific  function¬ 
al  groups  and  hydrocarbon  radicals  presented  in  CIDS  No.  4  are  read  from  cards. 
The  fragments  are  converted  to  the  proper  format  and  stored  in  groups  based  on 
the  presence  or  absence  of  certain  important  elements.  The  fragment  groups  are 
written  on  tape  together  with  an  index  to  the  location  to  each  group. 

2.4. 3. I  Program  Description 

The  cards  containing  fragment  screens  are  divided  into  groups  on  the  basis 
of  the  presence  of  certain  important  elements  before  they  are  loaded  into  the 
computer.  Preceding  each  group  is  a  title  card  which  names  the  group.  The 
name  is  composed  of  the  combination  of  the  following  elements  which  are  present 
in  the  fragments  in  the  group:  C,  N,  0,  P,  S,  and  X,  where  X  is  one  of  the 
halogens  (F,  Cl,  Br,  or  I).  Examples  of  titles  are:  CNO,  CX,  N.  In  addition, 
there  will  be  one  group  composed  of  fragments  which  contain  none  of  these 
elements.  The  groups  are  ordered  so  that  those  with  the  longest  titles  appear 
first.  Within  each  group  the  fragments  are  ordered  by  size  (number  of  atoms), 
with  the  larger  fragments  appearing  first. 

As  the  data  is  read  from  cards  the  molecular  formula,  connection  table, 
and  abnormalities  associated  with  each  fragment  are  converted  to  the  CIDS  in¬ 
ternal  formats  and  stored  in  an  8000  word  buffer.  Program  CONVRT  (Section  2.1.2) 
is  called  to  format  the  connection  table. 

When  another  title  card  is  encountered  in  the  input  data,  the  end  of  a 
group  has  been  reached.  A  word  of  zeros  followed  by  a  word  of  binary  one3  is 
stored  after  t.he  last  fragment  of  the  group  in  the  output  buffer.  The  next 
group  is  stored  immediately  following  the  word  of  ones  in  the  buffer. 

Whenever  the  title  card  of  a  group  is  read  from  the  input  file,  the  cur¬ 
rent  buffer  address  is  stored  as  the  location  of  the  beginning  of  that  group. 
Thi6  information  is  9tored  as  a  two  word  entry  in  the  index  table  (CROSS)  in 
the  format: 

Word  Contents 

1  Title  word  of  group 

2  Location  of  first  screen  of  group 

relative  to  beginning  of  buffer 

The  title  word  indicates  by  bits  in  certain  positions  the  contents  of 
the  block.  The  first  six  bit  positions  (S,l-5)  indicates  the  presence  or  ab¬ 
sence  of  C,  N,  0,  P,  S,  and  X  in  that  order. 

Screen  groups  can  be  separated  into  larger  blocks  (corresponding  to  dif¬ 
ferent  fragment  types)  to  interrupt  the  erasure  process  and  to  cause  the  con¬ 
nection  table  atoms  to  be  replaced  which  have  been  erased  because  of  fragments 
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being  assigned  Co  it.  New  values  can  be  given  to  ERASE  at  this  time  to  alter 
the  atoms  to  be  erased.  It  can  take  on  the  following  values: 

0  -  No  erasure 

1  -  Erase  non-carbons  only 

2  -  Erase  carbons  only 

This  can  be  accomplished  by  placing  a  card  with  'REPLAC'  in  columns  1  to  6 
and  '00X000'  in  7  to  12  where  X  is  the  value  to  be  assigned  to  ERASE.  These 
cards  are  inserted  before  group  title  cards,  and  cause  a  two  word  entry  in  the 
CROSS  array  of  the  form: 


777777777777 

OOOQOXYmYY 

where  X  is  the  new  value  for  ERASE  and  YYYYYY  is  the  address  of  the  previous 
screen  group. 

2. 4. 3, 2  Program  Structure 

The  data  cards  for  the  CIDS  screens  are  ordered  as  follows: 

RE  PL  ACOO  2000 
C  (Title  card) 

(Hydrocarbon  Radicals) 


REFLAC001000 

CNOP  (Title  card) 

functional  Groups  Containing  C,N,0,P) 
CNOS 

(Functional  Groups  Containing  C,N,0,S) 


(Rest  of  Functional  Groups) 


000000  (End  Card) 

The  functional  groups  must  be  assigned  last  so  that  the  connection  table 
retains  the  proper  erasure  for  processing  by  PSCKYT  and  NONSPC  for  the  assign¬ 
ment  of  non-specific  functional  groups. 

The  input  for  each  structural  fragment  consists  of  data  cards  containing 


the  following  information: 

Key  Number 
Molecular  Formula 
Connection  Table 
Abnormalities  (if  any) 

The  key  number  is  a  6  character  number  punched  in  the  first  6  columns  of  the 
card.  The  formats  for  the  mol  form  and  C.T.  cards  can  be  found  in  CIDS  No.  3 
(pages  164  and  165)  with  Che  exception  that  column  24  of  the  first  mol  form 
card  now  indicates  whether  any  abnormalities  are  present.  Each  abnormality 
has  the  form  "XY*Z".  Where  X  is  the  abnormality  type,  Y  is  the  atom  number, 
and  2  is  the  value  of  the  abnormality.  The  abnormality  types  are  V  (Valence), 

C  (Charge),  M  (Mass).  Examples:  Vl“5 .Cl=+1 .M5=14 . 

The  data  for  the  next  fragment  follows  immediately  except  when  control 
cards  are  needed  to  separate  groups  or  blocks.  See  discussion  in  Section  2. 4, 3,1. 


The  output  of  SLOAD  is  a  tape  on  which  the  first  physical  record  contains 
the  fragment  data.  The  format  of  Che  data  associated  with  each  fragment  is 
stored  as  follows: 


Word 


Contents 


1 


D“No.  of  words  in  fragment  record 
A*No.  of  words  preceding  structure  (m) 


2 


D*No.  of  words  preceding  abnormality  table 
(■0  if  no  abnormalities) 

A=No.  of  words  in  structure 


3 


Molecular  formula 


m  Key  number 

m+1  Connection  table 


n  Abnormality  table  (if  needed) 

(a  zero  word  follows  last  entry) 


•The  internal  format  of  the  mol  form  is  the  same  as  that  for  a  query  mol 
form  as  described  in  Section  3. 2. 2. 2.  The  internal  format  of  the  C.T.  is 
given  in  Section  2. 1.2. 2. 

The  index  is  v/ritten  as  the  second  record  on  the  output  i_ape .  The  de¬ 
crement  of  the  first  word  of  this  record  contains  the  complement  of  the 
number  of  words  in  the  index.  This  word  is  to  be  read  into  a  location  CROSCT, 
immediately  preceding  CROSS  when  the  tape  is  read  for  screen  assignment. 

When  the  program  is  run,  an  output  tape  must  be  loaded  on  S.SU05.  The 
data  earth  are  loaded  after  the  program  and  must  be  terminated  by  a  card 
containing  *000000'  in  the  first  six  columns. 
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2.4.4  Ring  Analysis  Executive 


Code  Name:  RINGl 

Programmer :  Jeffrey  W.  Kulick 

Abstract :  The  general  function  of  RINGl  is  to  find  the  smallest  set  of 

smallest  cycles  in  a  compound  patterned  after  the  rules  of  the  Ring  Index. 

These  cycles  are  determined  and  the  generic  cyclic  nuclei  keys  are  assigned  to 
the  compound.  The  keys  themselves  are  discussed  in  CIDS  No.  4,  and  the 
algorithm  for  assigning  them  is  discussed  below. 

2.4.4. 1  Program  Description 

In  the  process  of  finding  he  smallest  set  of  smallest  rings  (hereafter 
called  SSSR) ,  there  are  two  has.  .  processes,  "compression",  and  "cycle  finding". 
Compression  involves  a  number  of  passes  over  the  compound  record.  On  each  pass 
a  rule  is  applied  until  it  cannot  be  applied  any  further.  Control  then  passes 
to  a  succeeding  rule,  which  is  again  applied  as  many  times  as  possible. 

Compression-rule  I  is  called  "side-chain  peeling".  This  involves  the  re¬ 
moval  of  any  node  with  only  one  connection  and  its  redundant  entry.  Three 
passes  over  a  typical  compound  are  shown  below: 


Poss  3 


Compression-rule  II  Is  called  "compression  over  two-nodes".  This  means 
simply  that  any  node  with  exactly  two  connections  is  removed  from  the  table  and 
the  two  attached  nodes  are  shown  as  having  a  longer  chain  connecting  them.  This 
is  in  addition  to  the  standard  CIDS  compression  over  all  carbon  two-nodes. 
Typical  final  compression  compounds  look  as  follows: 


INITIAL 


FINAL 
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The  second  process  that  RING1  performs  is  to  select  the  SSSR.  The  follow 
ing  rule  is  used: 

(1)  Find  a  path  between  two  nodes. 

(2)  Is  there  a  better  path  between  these  two  nodes? 

NO  -  Go  to  (1). 

YES  -  Record  in  cycle  storage,  the  cycle  obtained  by  concatenat 
ing  the  path  obtained  in  (1),  and  the  best  path  obtained 
in  (2).  Then  erase  from  the  connection  table  the  path 
obtained  in  (1)  above.  Go  to  (1). 

This  rule  is  repeated  until  all  cycles  of  the  structure  are  found  or  steps 
(1)  and  (2)  fail  to  find  a  cycle  in  the  compound  record.  If  an  insufficient 
number  of  cycles  are  found,  the  compound  is  rejected  with  the  notation  "ring 
algorithm  insufficient."  Otherwise  processing  proceeds  with  key  assignment. 
Two  examples  of  the  selection  of  the  SSSR  is  given  below: 

Consider  the  compound: 


C  —  C  —  C  —  C  —  C 


C  — C—  C — C- 

»  «  r  e 


Ring  Selection  Process: 

(1)  Select  path:  7-8-9 

(2)  Best  alternate  path:  None 

(3)  Select  path:  9-10-11-1 

(4)  Best  alternate  path:  9-2-1 

*  (5)  Re.cord  ring: 

(6)  Remove  path:  9-10-11-1 

(7)  Select  path:  2-3-4-5-6-7 

(8)  Best  alternate  path:  2-1-7 

*  (9)  Record  ring: 

(10)  Remove  path:  2-3-4-5-6-7. 

The  compound  is  now 

I  2 

,C  —  C 


C —  C —  0 

7  0S 
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*  (11)  Record  ring: 

The  program  would  select  rings  0^,  end  as  the  rings  of  the  above  compound. 


Consider  the  compound; 


9  C^l  aXC3 

\  V  1 

c, 


Ring  Selection  Process: 

(1)  Select  path:  1-2-3-4 


(2)  Best  alternate  path:  1-6-5-4 

*  (3)  Record  ring:  C^N 

(4)  Remove  path:  1-2-3-4 


The  compound  is  now: 


»c/  V0e 


*  (5)  Record  ring: 


s° 


I 

o  C 


i 

Or 


The  program  would  select  rings  C^N  and  C.O  for  the  above  compound. 

For  these  above  processes,  the  major  sub-programs  used  are  RING2,  which 
performs  the  alternate  path  search  and  RING3-  and  RING4,  which  perform  the 
compression.  In  the  alternate  path  search,  one  path  is  considered  “’better" 
than  another  if: 


(1)  It  is  shorter  (contains  fewer  atoms),  or 

(2)  The  number  of  atoms  is  the  same,  but  the  "better"  path 

(a)  Contains  more  carbon  atoms  or 

(b)  Contains  the  same  number  of  carbon  atom6  but 
contains  more  atoms  of  the  highest  ranked  hetero¬ 
atom  which  appears  in  unequal  amounts  in  the  two 
paths.  The  precedence  of  heteroatoms  is  defined 
in  the  order:  I,  0,  S,  Se,  Te,  N,  P,  As,  Sb,  Bi, 

Si,  Ge,  Sn,  Pb,  Hg,  B,  all  others. 

For  example,  if  a  choice  is  to  be  made  between  a  C.OS  path  and  a  C?0P  path, 
the  C^os  path  would  be  considered  the  "better"  patn. 

The  assignment  of  keys  is  new  performed.  First,  ring  storage  is  exam¬ 
ined  to  obtain  the  ring  molecular  formulas  recorded  during  the  ring  selec¬ 
tion  process.  For  each  ring  in  ring  storage,  a  key  is  generated,  indicating 
the  atom  types  composing  the  rings. 

Next,  the  rings  in  ring  storage  are  sorted  according  to  size,  the  small¬ 
est  to  the  largest.  At  this  point,  each  ring  is  assigned  a  number  of  the 
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form  2n  (i.e.  1,  2,  4,  8,  16...). 

The  Identification  of  the  nuclei  follows.  This  is  done  by  partition¬ 
ing  the  rings  into  eauivalence  classes.  The  classes  are  defined  such 
that:  Ring  1  is  a  member  of  the  same  eauivalence  class  (nucleus)  as 
Ring  2,  if  and  only  if  there  exists  at  least  one  atom  A,  such  that 
A  is  in  both  Ring  1  and  Ring  2.  This  is  accomplished  by  intersecting 
the  atom  numbers  of  each  ring  with  the  atom  numbers  of  all  other  rings. 

If  they  have  an  atom  in  common,  they  are  noted  as  being  in  the  same 
nucleus.  When  two  rings  are  merged  to  form  a  nucleus,  a  notation  is  made 
as  to  which  rings  and  which  atoms,  are  in  that  nucleus.  This  process  is 
continued  until  all  intersections  yield  no  further  nuclei. 

For  each  nucleus,  the  list  of  rings  in  that  nucleus  is  used  to  obtain 
both  the  key  for  the  number  of  rings  in  the  nucleus  and  the  redundant  numer¬ 
ical  rin..  population  kev  (by  going  back  to  ring  storage  for  each  ring  number 
and  obtaining  the  size  of  the  ring). 

At  the  same  time,  for  each  nucleus,  the  atoms  compositing  it  are  obtained 
from  the  list  made  during  the  intersection  of  the  rings.  The  original  struc¬ 
ture  is  searched  to  obtain  the  element  tvoe  for  each  atom,  and  the  skeleton 
molecular  formula  key  is  assigned. 

The  above  processing  is  performed  for  each  of  the  nuclei,  keeping  a  count 
of  the  number  of  nuclei.  From  this,  "the  number  of  nuclei"  key  is  generated. 
These  counts  are  maintained  for  each  addend  separately. 

The  number  of  double  bonds  in  each  nucleus  is  now  commited.  In  a  large 
part  of  the  data  processed  bv  the  CTDS  programs,  resonant  bonds  have  been 
marked  as  type  4  bonds,  and  these  must  be  processed  separately  from  the 
standard  notation  for  double  bonds  (i.e.  tvpe  2).  If  no  bonds  in  a  structure 
have  been  marked  resonant,  then  the  number  of  double  bonds  In  the  cvclic  part 
are  merely  counted.  If  a  cyclic  substructure  of  a  structure  was  rreviouslv 
marked  resonant  (denoted  by  type  4  bonds),  then  the  total  number  of  double 
bonds  in  the  resonant  substructure  must  be  computed  before  determining  the 
number  of  double  bonds  in  the  non-resonant  cyclic  part.  In  the  resonant 
part : 

PB  =  RB  -  RC  +  1 


where  DB  is  the  number  of  double  bonds  in  the  resonant  part,  RB  is  the  number 
of  resonant  bonds,  and  RC  is  the  number  of  ring  closures. 
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Thi3  structure  is  processed  as  follows • 

(1)  Start  at  node  6. 

(2)  It  is  connected  to  one  atom  not  in  the  nucleus 
and  so  we  now  have  1  direct  attachment. 

(3)  It  is  connected  to  two  atoms  in  the  nucleus,  5  and  7, 
which  have  not  been  processed.  These  are  put  in  the 
"To  Process"  list  giving  (5,  7)  and  the  "Processed" 
list  becomes  (6).  The  number  of  double  bonds  in  the 
6-5  and  6-7  connections  are  saved. 

(4)  Now  process  atom  5. 

(51  It  has  a  resonant  connection  so  we  put  it  into  the 
Resonant  list  giving  (4). 

(6)  It  has  another  resonant  connection,  so  we  also  put  IB 
in  the  Resonant  list  (4,  18). 

(7)  The  total  number  of  4-bonds  encountered  is  now  (2). 

(8)  5  is  put  into  the  "Processed"  list,  giving  (6,  5)  and 
the  "To  Process"  list  is  (7). 

(9)  Next,  process  resonant  atom  4. 

(10)  5  has  been  processed  90  look  at  3. 

(tl)  3  is  not  processed  so  add  it  to  the  "Resonant"  list 
which  is  now  (18,  3). 

(12)  Put  4  in  the  "Processed"  list  giving  (6,5,4). 

(13)  The  "Resonant"  list  is  not  empty  so  process  18. 

(14)  18  is  connected  to  17  so  add  17  to  the  "To  Process" 
list  giving  (7,  17).  Also  add  the  number  of  double 
bonds  to  the  D  bond  count. 

(15)  18  is  also  connected  to  19  sc  add  19  tc  the  "Resonant" 
list  (3,  19). 

(16)  Mark  as  processed  18,  and  the  "Processed"  list  i3  now 
(6,  5,  4,  18). 

The  process  is  continued  as  follows: 


Atom  Being 

Ring 

Processed 

Processed 

Resonant 

To  Process  4 

-bonds 

D-bonda 

Closure 

3 

(4,5,6,16) 

(19) 

/-F 
>■  *  » 

17) 

(4) 

(0) 

(0) 

19 

(3,4,5,6,18) 

(20,  2) 

(7, 

17) 

(6) 

(0) 

(0) 

Now  19  is  processed,  but 

19  points  to  18, 

which  is 

processed. 

Therefore  this  is  a 

ring  closure.  Add 

1 

to  the  ring 

closure  count 

and  continue. 

20 

(3,4,5,6,18, 

19) 

(2) 

(7, 

17) 

(7) 

(0) 

(1) 

2 

(3,4,5,6,18, 

19,20) 

(21) 

(7, 

17) 

(8) 

10) 

(1) 

21 

<2-6,18.19, 

201 

(1) 

(7, 

17) 

(9) 

(0) 

(1) 

1 

(2-6,18,19, 

(22) 

f7, 

17) 

(10) 

(0) 

<1/ 

20  ' 
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Again  a  ring  closure  baa  been  found.  Atom  1  connects  _o  2  which  i a  processed, 


and  to  2?  ’ 

which  is  in  a  table. 

Tn  addition  it  connects 

to  24  which  is  not 

in  this  nucLeus.  Therefore  the 

number  of 

direct  attachments  is 

increased 

to  2. 

Atom  Being 

Ring 

Processed 

Processed  Resonant 

To  Process 

4-bonds 

D-bonds 

Closure 

22 

(1-6,18,19,20,21) 

(0) 

(7,  17) 

(ID 

(0) 

(2) 

The  "Resonant"  table  has  now  been  emptied. 

Before 

going  back  to  the  " 

To  Process' 

table  perform  the  computation: 

(KB  -  RC  +  1) 

/:  =  DB 

(11  -  2  +  1) 

/ 2  ■  5  double  bonds. 

7 

(1-6,  18-22) 

(0) 

(17) 

(0) 

O) 

(0) 

8 

(1-7,  18-22) 

(8,  16) 

(17) 

(2) 

(5) 

(0) 

16 

(1-8,  18-22) 

(9) 

(17) 

(3) 

(5) 

(0) 

9 

(1-8,16,18-22) 

(15) 

(17,  23) 

(4) 

(51 

(0) 

15 

(1-9,16,18-22) 

(14,10) 

07,  23) 

(6) 

(5) 

(0) 

14 

(1-9,15-16,18-22) 

(10) 

(17,  23) 

(7) 

(5) 

(1) 

10 

(1-9,14-16,18-22) 

(13) 

(17,  231 

(8) 

(5) 

O) 

13 

(1-10,14-16,18-22) 

(11) 

(17,  23) 

(9) 

(5) 

O) 

11 

(1-10,13-16,18-22) 

(12) 

(17,  23) 

(10) 

(5) 

(1) 

12 

(1-11,13-16,18-22) 

(0) 

(17,  23) 

(11) 

(5) 

m 

No  more  resonant  bonds ,  therefore  compute 

(12  -  2  +  1) 

/2  *  5  double  bonds. 

17 

(1-16,18-22) 

(") 

(23) 

(0) 

(10) 

(0) 

23 

(1-22) 

(0) 

(0) 

(0) 

(11) 

(0) 

Totals : 

11  double  bonds,  2 

direct  attachments. 

This  is  the  procedure  determining  the  number  of  double  bonds  in  a  structure. 
The  following  basic  assumptions  have  been  made: 

(1)  There  are  no  resonant  spire  structures 

(2)  The  number  of  double  bonds  in  a  resonant  structure 
is  (RB  -  RC  t-  1)  /2  as  stated  above. 

At  this  point,  the  key  for  Che  total  number  of  direct  attachments  to  all 
nuclei  is  assigned. 

During  this  computation,  a  r.oto  has  been  made  of  the  number  of  bonds  of 
each  type  appearing  in  each  path;  i.e.  for  the  path  between  16-17,  one 
double  bond  was  noted.  Only  those  structures  not  marked  resonant  already 
(non-CAS  data)  will  be  marked  resonant  by  the  next  process.  To  mark  the 
resonant  structure,  ring  storage  is  again  consulted.  The  process  is  illustrated 


by  means  of  a  typical  example.  Consider  the  compound: 


First,  all  rings  with  an  odd  number  of  atoms  are  removed.  It  has  been  as¬ 
sumed  that  any  structure  with  an  odd  number  of  nodes  cannot  be  resonant. 


2 

4 

6 

d 

10 

24 

22 

20 

id 

16 

Next  try  to  find  a  "handle".  A  handle  is  defined  as  a  ring  which  meets  the 
requirement  of  being  resonant  a6  defined  by  the  rule  that  the  number  of  double 
bonds  be  exactly  one-half  the  number  of  nodes.  Suppose  that  ring  (3,4,5,21, 
22,23)  is  chosen.  It  is  determined  that  by  itself  it  is  not  resonant  (6  nodes  - 
2  double  bonds).  Now  pick  (1,2,3,23,24,25).  This  satisfies  the  criterion  for 
resonance,  and  therefore  is  considered  to  be  the  first  handle. 

The  next  step  is  to  find  another  ring  connected  to  this  ring.  This  is 
done  similarly  to  the  nucleus  search.  Each  of  the  other  rings  in  ring  storage 
is  intersected  with  the  handle.  It  is  found  that  (3,4,5,21,22,23)  is  connected 
to  the  handle.  When  (1,2,3,4,5,21,22,23,24,25)  is  tested  for  resonance,  we 
find  that  this  whole  structure  is  resonant,  and  erase  from  storage  the  individ¬ 
ual  rings.  This  new  structure  is  now  the  handle.  Again,  a  search  is  made  for 
a  ring  chat  is  concatenated  with  the  new  structure,  and  (5,6,7,19,20,21)  1b 
found.  It  is  found  that  (1,2,3,4,5,6,7,19,20,21,22,23,24,25)  is  not  resonant. 
Not  only  is  ring  (5,6,7,19,20,21)  rejected  as  being  part  of  this  resonant 
structure,  but  it  can  never  be  a  member  of  any  resonant  structure  and  so  it 
is  erased  from  ring  storage.  No  other  rings  can  be  found  in  common  with  the 
first  resonant  structure,  so  the  program  proceeds  to  mark  the  bonds  of  these 
rings  as  being  resonant. 

The  same  procedure  is  followed  for  the  second  part  of  the  structure.  It 
is  found  that  the  second  part  is  also  resonant  and  it  is  marked  accordingly. 
Finally,  the  "marked-up"  structure  is  copied  into  the  output  area. 


2.4.4. 2  Program  Structure 

RINGl  i3  a  subroutine  of  the  Key  Assignment  System.  It  is  called  by  a 
"TSLR1NG1".  It  in  turn  calls  subroutines  RING2,  R1NG3 ,  and  RING4.  The 
program  requires  the  following  input: 

(1)  CIDS  formatted  connection  table 

(2)  Flag  word  CASWCH  set  plus  for  GAS  data,  otherwise  minus 

(3)  Index  Register  1  contains  size  of  connection  table 

(4)  Address  of  connection  table  right-adjusted  in  accumulator 


1 46 


The  following  output  is  produced  by  the  program: 

(1)  CEDS  formatted  connection  table.  (If  non-CAS  data, 
ring  bonds,  ring  atoms,  and  resonant  bonds  have  been 
marked.  The  connection  table  for  CAS  data  is  left 
unchanged.) 

(2)  The  keys  assigned  are  stored,  2  words  per  key,  starting  at 
location,  SCKY+1. 

(3)  BADAT  =0  if  no  error  was  detected  in  the  connection 
table.  BADAT^O  If  bad  compound  record. 

In  addition,  if  sense  switch  2  is  pressed  in,  the  keys  which  were  assigned 
are  printed.  If  sense  switch  3  is  pressed  in,  an  octal  dump  of  the  marked- 
up  connection  table  is  printed. 

The  following  diagnostics  are  printed  to  signal  cond'tions  that  cannot  be 
handled  by  the  program: 


(1)  "Over  3  non-specific  types  in  ring".  This  means  that  a  ring 
or  nucleus  contains  more  than  three  different  element  kinds 
which  are  other  than  C,  N,  0,  P,  S. 

(2)  "Too  many  rings  per  nuclei"  This  means  a  nucleus  contains 
more  than  17  rings. 

(3)  "Ring  algorithm  insufficient"  This  message  appears  when  struc¬ 
tures  are  too  complex  to  be  analyzed  by  the  present  ring  al¬ 
gorithm. 

(4)  Error  El  -  bad  connection  table. 

A  macro  flow  chart  of  the  program  is  presented  in  Figure  37. 


147 


/ 


RING  1 


Call 

RING4 


Call  RING 3 
to  Perform 
Compression/ 


Find  a 
Candidate 
Path 


Assign 
Numbers  to 
Compressed 
Carbons 


Erase  Candidate 
Path.  Decrease 
Count  of  Rings 
to  be  Found 


Call  RING2 
to  Find 
Alternate 
Path 


Any 

More 

Paths? 


/  Is  Alter-' 
nate  Path 
Better? 


Enter  Ring 
in  Ring 
Storage 


.48 


igure  37.  Macro  Flow  Chart  -  RING1  (Continued) 
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2.4.5  Alternate  Path  Search 


Code  Name:  MUSTRP  or  RING2 

Programmer:  Helen  Hill 


Abstract :  MUSTRP  ia  given  a  connection  table  path  between  two  nodes  in 
a  ring  and  searches  for  any  alternate  paths  between  these  two  nodes.  If  more 
than  one  alternate  path  is  found,  the  "best"  of  these  is  chosen. 


2. 4. 5.1  Program  Description 

MUSTRP  examines  a  connection  table  which  has  been  compressed  as  described 
in  Section  2. 4. 4.1.  The  program  takes  a  pointer  to  a  given  path  between  two 
nodes  in  a  ring  and  looks  for  the  "best"  alternate  path  between  those  two 
nodes.  The  criterion  by  which  one  path  is  considered  to  be  "better"  than 
another  path  is  described  in  Section  2. 4. 4.1. 

On  being  called  by  the  main  ring  program  RING1,  MUSTRP  checks  that  the 
given  pointer  indicates  a  path  of  length  2  or  greater  and  also  compares  the 
two  terminal  nodes.  If  these  are  found  to  be  the  same  node,  control  is  re¬ 
turned  to  the  main  program  with  an  indication  that  a  self-ring  was  found. 

This  occurs  when  a  ring  contains  at  most  one  atom  which  has  more  than  two 
connections  to  it. 

The  program  follows  each  path  from  one  of  the  nodes  until  the  path  length 
exceeds  the  given  path  length  without  having  passed  through  the  second  node. 
Given  the  compound 


and  given  the  path  1-2-3,  the  program  begins  with  atom  3  and  examines  each  path 
from  this  atom.  Path  3-5-4-1  is  found  to  be  too  long  to  qualify  as  "better" 
than  the  given  path  1-2-3,  but  it  is  an  alternate  path,  so  this  is  noted  as 
the  present  best  alternate  path.  Path  3-6  is  a  possible  candidate,  so  it  is 
stored  in  a  linked  list.  The  program  then  examines  other  paths  leading  from 
atom  6  to  determine  whether  the  3-6  path  is  part  of  a  better  path  between 
3  and  1.  Path  3— 6—7— 8— 6  is  considered  and  found  to  be  too  long  to  qualify. 

Path  3-6-8-7-6  is  also  too  long.  The  only  remaining  alternative  is  3-6-1, 
and  this  now  turns  out  to  be  a  better  path  than  the  present  best  alternate 
path  3-5-4-1.  It  therefore  replaces  it  as  the  best  alternate  path.  This  is 
returned  to  the  main  program  with  a  bit  set  indicating  that  an  alternate 
path  was  found.  If  no  alternate  path  is  found,  the  given  path  is  returned 
without  the  bit  being  set. 

A  flow  chart  of  the  program  is  presented  in  Figure  38. 


50 


Y  / / 
\ 


"To" 
Atom  aiuoe 
as  "From" 
Atom? 


Gives 
Bath  Length 
■  0? 


Set  Error 
Indicator 


Interchange 
"From"  and  "To" 
atom.  Store 
Given  Path  as 
Beat  so  far. 


RETURN 


Path  \ 
Length  «  ^N 
and  Quality  /"I 
Better?  /  I 


J  I 


3 


-  RING 2 


Figure  38.  Kacro  Flow  Chart 


N /  List 
■““V  Empty? 


Concatenate 
next  Connection 
with  List  Entry. 
Check  that  Path 
Doesn't  return 
on  itself. 


More 

N  /Connections 
to  "From" 
Atom? 


Get  Entry 
from  List. 
Replace  "From" 
Atom  with  "To" 
Atom  of  Entry. 


v 


152 


2. 4. 5. 2  Program  Structure 


RING2  is  a  subroutine  of  the  Ring  Analysis  portion  of  the  Key  Assignment 
System,  The  input  consists  of  a  pointer  to  a  connection  table  entry.  In 
addition,  the  X,  E,  C  and  ECOUNT  tables  are  in  core. 

The  output  consits  of  a  7  word  block  describing  the  best  alternate  path 
found  by  the  program.  The  format  of  the  block  is 


Word 

Contents 

1 

Path  length 

2-3 

One  of  72  bits  i6  set  for 
atom  number  in  the  path 

each 

4-7 

Stores  the  count  of  atoms 
element  type  in  the  path 

of  each 

In  addition,  the  following  indicators  are  set  by  the  program: 


BADAT 

SPIRO 

QUAL 


Set  to  indicate  an  error  was  detec 

Set  if  a  self-ring  was  found 

Set  if  an  alternate  path  was  fount". 


3.4.6  Ring  Compression 


Code  Name:  COMPR  or  RING3 

Programmer:  John  D.  Leggett 

Ah9tract :  The  purpose  of  program  COMPR  is  to  remove  all  atoms  in  the 

connection  table  which  have  exactly  two  attachments  and  to  remove  side 
chains  from  the  structure,  in  order  that  the  ring  descriptors  may  be  found. 

The  program  also  contains  a  subroutine  which  removes  a  prescribed  path 
from  the  structure. 

2.4.6. 1  Program  Description 

The  compression  portion  of  the  program  operates  similarly  to  program 
CONVRT,  with  two  exceptions:  (1)  all  atoms  with  two  connections  are 
removed,  and  (2)  the  bond  table  is  not  treated.  As  each  atom  is  removed, 
the  internal  node  numbers  (or  numbers  representing  previously  removed  atoms 
are  combined  to  form  the  description  of  the  path  between  the  two  atoms  to 
which  the  removed  atom  is  connected.  Likewise  the  element  description  of  the 
paths  are  combined  to  give  the  specification  of  the  new  path.  Upon  completion 
of  the  first  compression,  the  side  chains  are  removed,  and  the  structure  is 
compressed  again. 

When  a  ring  has  been  found  in  the  structure  by  program  RING1,  the  path 
removal  subroutine  (SSC)  is  called.  The  connection  table  is  compressed  again 
and  side  chains  are  removed  after  a  path  has  been  deleted.  If  a  self-ring  is 
found  at  any  stage  of  the  compression,  control  is  immediately  returned  to  the 
main  ring  program,  since  this  ring  must  be  one  of  the  desired  cycles. 

A  flow  chart  of  the  program  is  presented  in  Figure  39  . 

2.4.6. 2  Program  Structure 

RING3  is  a  subroutine  of  program  RING1.  The  input  to  the  program  is 
the  connection  table  produced  by  RING4.  The  output  consists  of  the  compressed 
connection  table  and/or  a  connection  table  with  some  path  removed,  and/or  the 
location  of  any  self-ring. 

If  an  error  is  found  in  the  connection  table,  control  is  returned  to  the 
main  ring  program  with  an  error  bit  set. 
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Figure  39.  Macro  Flow  Chart  -  RING3 
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2.4.7  Connection  Table  Expansion 


Code  Maine :  TABLE  or  RING4 

Programmer:  John  D.  Leggett 

Abe tract :  The  purpose  of  the  program  is  to  expand  a  connection  table 
given  in  the  compreeeed  format  (see  program  CONVRT)  to  a  form  suitable  for 
application  of  the  ring  analysis  programs. 

2. 4. 7.1  Program  Description 

The  program  takes  the  connection  table  in  the  form  in  which  it  is 
output  from  CONVRT  (Section  2.1.2)  and  expands  it  into  X  and  E  tables.  ThiB 
form  is  the  same  as  the  stage  just  prior  to  formating  in  CONVRT. 

The  next  step  is  the  generation  of  path  descriptors.  The  first  such 
descriptor  is  the  atom  numbers  of  the  nodes  present  in  a  given  path,  includ¬ 
ing  the  "from  atom"  but  not  the  "to  atom".  For  any  atoms  not  explicitly 
specified  in  the  connection  table  (i.e.  if  a  path  is  of  length  greater  than 
one)  artificial  atom  numbers  are  added.  The  second  deacriptor  Is  the  element 
kind  of  each  atom  in  the  given  path. 

A  macro  flow  chart  of  the  program  is  presented  in  Figure  40. 

2.4. 7.2  Program  Structure 

RING4  is  a  subroutine  which  is  called  by  RING1.  It  ie  given  a  compressed 
connection  table  (Section  2. 1.2. 2)  as  input.  The  output  consists  of  X  and  E 
tables  as  described  in  Section  2. 1.2. 2  and  the  generation  of  path  descriptors 
for  use  by  the  other  ring  analysis  programs. 

If  an  error  is  detected  in  the  connection  table,  control  ie  returned  to 
the  main  ring  program  with  an  error  bit  set. 


2.4.8  Atom-bv-Atom  Search 


Code  Name:  STRUC 

Programmer:  John  D.  Leggett 

Abstract:  The  purpose  of  the  atom-by-atora  search  program  is  to  deter¬ 

mine  if  a  one-to-one  correspondence  exists  between  the  nodes  (atoms)  and  con¬ 
nections  in  a  given  query  structure,  and  some  set  of  nodes  and  connections 
in  a  given  file  compound  structure.  The  element  kinds  of  the  nodes  in  each 
are  compared  as  well  as  the  connections  between  the  atoms,  and  any  cyclic  prop¬ 
erties.  The  output  of  the  program  is  a  yea  or  no  answer  for  a  given  query  frag¬ 
ment  and  compound  structure.  Provision  is  made  to  erase  portions  of  the  file 
compound  in  core  during  processing  to  avoid  fragment  overlap  during  repetitive 
search. 


2.4.8. 1  Program  Description 

In  order  that  a  one-to-one  correspondence  exists  between  the  query  nodes 
(atoms)  and  branches  (bonds),  and  some  set  of  nodes  and  branches  in  the  file 
compound,  it  is  necessary  and  sufficient  that  every  possible  path  through  the 
query  structure  have  a  corresponding  path  through  the  file  compound.  It  is 
not  necessary  to  examine  every  path  of  the  file  compound,  provided  that  every 
node  and  branch  in  the  query  is  contained  in  the  set  of  paths  that  we  chose  to 
examine . 

It  is  required  that  each  pair  of  atoms  matched  have  the  same  element  kind, 
except  for  two  special  codes  used  in  queries,  which  may  match  any  of  a  set 
of  element  types  in  a  file  compound.  These  are  (1.)  EE  which  represents  any 
element  except  hydrogen,  and  (2)  EL  which  represents  any  element  except  carbon 
and  hydrogen.  In  addition,  any  abnormality  in  the  query  must  be  satisfied. 

It  is  also  required  that  each  pair  of  atoms  matched  have  the  same  number 
of  direct  attachments  except  where  this  restriction  is  specifically  relaxed  by 
a  special  character  in  the  query. 

In  the  connection  table,  special  Indicators  mark  atoms  which  are  members 
of  a  ring  and  paths  which  are  part  of  a  ring.  The  atom-by-atom  search  re¬ 
quires  that  matched  atoms  must  either  both  be  in  a  ring  or  neither  be  in  a 
ring.  Likewise,  paths  must  be  matched  as  both  either  ring  or  non-ring  paths. 

The  element  of  the  starting  query  atom  is  matched  against  successive 
file  atoms  until  a  match  is  found.  The  trace  then  begins  by  placing  one  of 
the  connections  in  the  hypothesis  list,  and  the  remainder  in  a  choice  list. 

A  similar  step  is  taken  In  the  file  compound,  and  the  pair  of  atoms  in  the 
hypothesis  list  is  matched.  If  the  test  is  successful,  the  stepping  continues 
until  (a)  the  end  of  a  path  is  reached,  (b)  a  cycle  le  closed,  or  (c)  a  mis¬ 
match  is  found.  If  (a)  or  (b)  occurs  the  program  searches  back  up  the 
choice  list  for  the  last  available  (untraced)  pach  in  the  query  and  file  com¬ 
pound,  and  the  trace  begins  again.-  When  a  mismatch  is  found,  the  trace  backs 
up  to  the  last  available  alternate  path  in  the  file  compound  and  starts  again 
using  the  same  query  path. 
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When  the  alternate  paths  have  been  exhausted,  a  new  file  compound  start¬ 
ing  atom  must  be  selected.  If  the  list  of  possible  starting  atoms  is  ex¬ 
hausted,  the  search  is  said  to  fail. 

If  every  path  in  the  query  is  successfully  matched  against  a  path  m  the 
file  compound,  the  search  is  said  to  pass.  In  this  case  any  appropriate  erasure 
of  the  file  compound  takes  place.  Thus  if  the  program  is  called  again  to  de¬ 
termine  if  the  query  fragment  is  present  in  another  part  of  the  file  compound, 
the  set  of  atoms  which  were  previously  matched  with  the  query  fragment  are  no 
longer  available  choices.  Thus  the  program  must  look  further  in  the  connec¬ 
tion  table  to  locate  some  other  set  of  atoms  which  satisfies  all  the  require¬ 
ments  of  the  query. 


2. 4. 8. 2  Program  Structure 

STRUC  is  a  subroutine  which  serves  important  functions  in  several  differ¬ 
ent  parts  of  the  CIDS  system.  In  the  Key  Assignment  System,  STRUC  is  called 
to  determine  if  a  given  screen  fragment  is  satisfied  by  a  given  file  compound 
in  order  to  decide  if  the  corresponding  key  should  be  assigned  to  the  compound. 
In  the  Retrieval  System,  STRUC  is  called  to  decide  whether  a  file  compound 
should  be  retrieved  by  a  query  by  determining  whether  the  compound  satisfies 
the  query  structure.  In  the  Registration  System,  STRUC  determines  if  a  possible 
registrant  has  the  same  connection  table  as  a  registered  compound  which  has 
the  same  molecular  formula. 

The  program  requires  as  input  the  core  locations  of  the  connection  tables 
for  the  query  fragment  and  the  file  compound,  and  ^he  length  of  the  latter. 

In  addition,  the  locations(s)  of  the  abnormality  table(s)  if  any  are  required. 
Location  ERASE  is  used  to  indicate  the  type  of  erasure  (if  any)  to  be  performed. 

If  a  match  was  found  between  the  query  and  the  file  compound,  STRUC  returns 
to  the  calling  program  with  the  accumulator  non-sero.  If  the  search  failed, 
the  accumulator  is  set  to  aero  before  returning  control. 

A  macro  flow  chart  of  the  program  is  presented  in  Figure  41. 
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2.4.9  Nonspecific  Hydrocarbon  Radical  Key  Assignment 


Code  Name:  HCRCT 

Programmer:  Jeffrey  H.  Kulick 

Abstract :  Program  HCRCT  has  been  developed  to  assign  to  a  structure  a 

sub-class  of  the  set  of  aliphatic  hydrocarbon  radical  keys. 

2. 4. 9.1  Program  Description 

HCRCT  assigns  to  a  compound  the  following  hydrocarbon  radical  keys: 

(1)  A  single  key,  3-A-1-28,  Is  assigned  to  compounds 
which  contain  a  straight  single-bonded  (i.e.  saturated) 
carbon  chain  which  is  attached  to  only  one  non-carbon 
atom  and  which  contains  more  than  19  carbonB. 

(2)  The  key  (C)  —  El  is  assigned  to  any  single-bonded 
carbon  chain  (branched  or  unbranched)  which  contains 
at  least  5  carbon  atoms  and  is  attached  to  one  and 
only  one  non-carbon  atom  (El).  A  different  key  is 
assigned  for  each  value  of  n. 

Program  HCRCT  first  creates  two  dictionaries.  Both  have  one  entry  per  atom. 
The  first  dictionary  contains  the  following  information  for  each  atom: 

(1)  Ring  or  non-ring  atom 

(2)  Number  of  attachments  to  the  atom. 

second  dictionary  contains  the  following  information  for  each  atom: 

(1)  Carbon  or  non- carbon 

(2)  A  number  which  identifies  the  hydrocarbon 

radical  to  which  this  atom  has  been  assigned 

(3)  Location  of  this  atom  in  the  connection  table. 

The  program  first  locates  a  "candidate  atom"  (a  carbon  atom  with  only  one 
attachment  that  has  not  previously  been  assigned  to  a  hydrocarbon  radical) 
If  the  attached  atom  is  carbon,  then  each  of  its  connections  are  searched 
in  turn  to  see  if  the  structure  qualifies  for  being  a  monovalent  hydro** 
carbon  radical  (HCR) .  A  count  of  the  number  of  atoms  is  maintained  as  this 
search  progresses.  Additionally,  as  each  atom  is  searched  in  turn,  the 
second  entry  of  Dictionary  2  is  filled  in,  indicating  membership  in  a  can¬ 
didate  HCR.  This  is  to  prevent  double  assignment  for  atoms  1  and  2  in  the 
structure: 

C. 

I 

N  —  C  —  C  —  C  —  C2 


/ 
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The  search  along  any  particular  path  is  continued  until  a  non-carbon 
is  encountered.  The  number  of  non-carbons  attached  to  ally  candidate  HCR 
is  saved,  and  a  key  is  assigned  only  if  this  count  equals  one.  •  A  candidate 
HCR  is  also  rejected  if  any  double  or  triple  bonds  appear  within  it.  A 
further  restriction  is  that  none  of  the  carbon  atoms  within  the  HCR  may  be 
a  member  of  a  ring,  but  the  non-carbon  to  which  it  is  attached  may  or  may  not 
be. 


A  flow  chart  of  the  program  is  presented  in  Figure  42, 

2. 4. 9. 2  Program  Structure 

HCRCT  is  a  subroutine  of  the  Key  Assignment  System.  It  requires  as  input 
a  compressed  connection  table  as  described  in  Section  2. 1.2. 2.  The  accumulator 
must  contain  the  core  address  of  the  connection  table,  and  index  register  1 
must  contain  the  length  of  the  connection  table.  The  program  stores  any  keys 
that  are  assigned  to  the  compound  in  an  array  for  later  addition  to  the  compound 
record. 
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-Figure  42.  Macro  Flow  Chart 


2.4.10  Bond  Count 


Code  Name:  BONDCT 
Programmer:  Jeffrey  H.  Kulick 

Abstract :  Program  BONDCT  was  developed  to  assign  a  specific  limited  sub¬ 

class  of  the  aliphatic  keys.  It  assigns  the  acyclic  nucleus  keys  and  two 
types  of  hydrocarbon  radical  keys. 

2.4.10.1  Program  Description 

BONDCT  assigns  the  following  keys  described  in  CIDS  No.  4  as  acyclic 
nuclei  keys  to  acyclic  compounds  or  acyclic  addends  of  cyclic  C'npounds  only: 

(1)  Number  of  double  bonds  between  carbon  atoms 

(2)  Number  of  triple  bonds  between  carbon  atoms 

(3)  Number  of  carbons  with  three  carboi  attached 
regardless  of  internal  bonding  or  additional 
non- carbon  connections 

I 

C  —  C  — C  C—C —  c 

I  I 

c  c 


Number  of  carbons  with  four  carbons  attachec 
regardless  of  additional  connections 
C 

I 

c  —  c — c 

I 

c 

Note  that  keys  are  assigned  for  each  of  the  above  four  structural  features 
even  if  soma  of  the  resulting  counts  are  zero.  Separate  counts  are  made 
for  each  acyclic  parent  or  addend,  and  separate  keys  are  assigned  for  each, 
except  that  at  most  one  zero  key  is  assigned  to  a  compound  for  any  of  the 
four  above  keys. 

In  addition,  the  following  hydrocarbon  radical  keys  are  assigned  to 
any  compounds  in  which  they  occur : 

(1)  El- — (C)^—E1  For  any  aliphatic  unbranched  single- 

bonded  carbon  chains  attached  to  two  non-carbons  (El), 
a  key  is  assigned  which  gives  the  length  of  the  chain. 

Each  El  can  be  any  element  other  than  carbon  (or  hydrogen) 
and  may  or  may  not  be  a  member  of  a  ring.  They  cannot,  of 
course,  be  in  the  same  nucleus. 

(2)  El — C<-*C  —  El  This  key  is  listed  as  3-B-2-1  in  CIDS 
No.  4 . 


Program  BONDCT  first  creates  2  dictionaries.  One  has  an  entry  for  each  atom. 

This  dictionary  contains  information  as  to  whether  the  atom  is  carbon  or  not, 
whether  it  ia  in  a  ring  or  not,  and  which  addend  it  is  in. 

The  second  dictionary  contains  one  entry  for  each  entry  in  the  bond  table, 

1. e.,  for  every  bond  in  the  CIDS  compressed  connection  table  as  described  in 
Section  2. 1.2. 2.  This  dictionary  tells  whether  a  particular  bond  ia  a  ring  bond 
(a  bond  in  a  cycle). 

BONDCT  starts  at  the  last  atom,  and  processes  each  atom  from  last  to  first. 
The  basic  processing  for  each  atom  is  as  follows: 

(1)  Find  an  atom 

(2)  For  each  bond  chain  (compressed  carbon  chain)  connected  to 
that  atom: 

(a)  Count  the  double  and  triple  bonds  between  carbon 
atoms. 

(b)  If  the  "From"  and  "To"  atoms  are  non-carbon  and 

all  bonds  are  single  bonds,  count  the  bonds  and 

assign  key  for  El  — (C)  —  El. 

n 

(3)  If  the  "From"  atom  is  carbon,  check  the  number  of  carbon 
connections  for  possible  increase  in  the  count  of 

I  I 

—  C —  or  — C -  configurations. 

When  all  atoms  have  beer,  processed,  all  sums  are  divided  in  half  (because  o'- 
redundant  entries)  end  keys  are  assigned. 

A  flow  chart  of  the  program  is  presented  in  Figure  A3. 

2.  A. 10. 2  Program  Structure 

BONDCT  is  a  subroutine  of  the  Ke>  Assignment  System.  A  call  to  subroutine 
HCRCT  must  precede  a  call  to  BONDCT  because  that  program  sets  up  certain  arrays 
which  are  used  by  BONDCT.  The  input  requirements  and  the  output  are  the  same  as 
tor  program  HCRCT  (Section  2. A. 9). 


Figure  43 .  Macro  Flow  Chart  -  BONDCT 


2.4,11  Molecular  Formula  Key  Assignment 


Code  Name:  MFSRN 

Programmer :  Ruth  V.  Powers 

Abstract:  Molecular  Formula  keys  are  assigned  to  a  compound  based  on  the 
Hill  molecular  formula.  One  key  Is  assigned  to  identify  each  element  present. 

2.4.11.1  Program  Description 

A  key  is  assigned  for  each  element  appearing  in  the  Hill  molecular 
formula  of  a  compound.  The  key  specifies  the  number  of  atoms  present  for  the 
elements:  C,  H,  N,  0,  P,  S,  F,  Cl,  Br,  I,  Si  and  B.  For  all  other  elements 
the  key  specifies  only  the  presence  of  the  element,  regardless  of  the  number 
of  atoms.  In  addition,  a  metal  key  is  assigned  to  the  compound  if  it  contains 
elements  other  than  the  twelve  special  listed  above  and  As,  Sb,  Se,  Te. 

A  flow  chart  of  the  program  is  presented  in  Figure  44. 

2.4.11.2  Program  Structure 

MFSRN  is  a  subroutine  in  the  Key  Assignment  System.  It  is  called  by  a 
'TSL  MFSRN'.  Location  COMPND  must  be  defined  in  the  calling  program  as  an 
entry  point.  The  location  must  contain  2  in  the  tag  portion,  and  the  address 
of  the  first  word  of  the  Hill  mol  form  (the  second  word  of  the  mol  form  block) 
in  the  address  portion. 

The  keys  assigned  to  the  compound  are  stored  in  the  SCKY  array,  which  is 
defined  as  an  entry  point  in  the  calling  program.  The  new  keys  are  added 
to  those  already  stored,  and  location  SCKY  is  updated  to  equal  the  number  of 
words  in  the  array.  The  second  word  of  each  mol  form  key  is  zero.  The  first 
word  has  the  format: 

(S , 1-5)  (6-17)  (18-35) 

101000  El  Count 

"El"  is  the  abbreviation  of  the  element  kind  in  BCD.  For  a  single  letter 
abbreviation,  such  as  "C",  a  BCD  blank  precedes  the  letter.  "Count"  is  the 
number  of  atoms  of  that  element  kind  if  it  is  one  of  the  twelve  special  ele¬ 
ments  listed  above.  Otherwise  it  is  zero.  The  abbreviation  "M"  takes  the 
place  of  the  element  kind  for  metal  keys. 

If  sense  switch  2  is  pressed  in,  the  keys  are  printed  on  the  system 
output  tape. 
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MFSRN 


Figure  44.  Macro  Flow  Chart  -  MFSRN 
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2.4.12 


Nonspecific  Phosphorus  Functional  Group 
Code  Name:  PSCKYT 
Programmer:  Ed  Hebei 

£bstrac|:  Subroutine  PSCKYT  assigns  keys  to  compounds  which  contain  cer- 
tain  types  of  phosphorus  functional  groups  which  were  not  among  those  selected 
as  Specific  Functional  Group  keys  (listed  in  Table  XIV  in  CIDS  No.  4). 


2.4.12.1  Program  Description 

Subroutine  PSCKYT'  examines  a  compound  connection  table  (C.T.)  in  core 

the  l  T  PV®Sence  °5  "^specific  phosphorus  functional  groups.  Previous  to  this, 
the  C.T.  was  searched  for  the  presence  of  Specific  Functional  Group  fragments, 
and  whenever  one  of  these  was  located,  the  non-carbon  atoms  which  corresponded 
to  those  in  the  fragment  were  zeroed  out  in  the  C.T.  Thus,  it  is  known  that 
none  ot  the  phosphorus  atoms  remaining  in  the  C.T.  were  present  in  any  of  the 
specific  fragments. 

PSCKYT  first  examines  the  C.T.  for  phosphorus  atoms.  When  one  is  located 

its  connections  are  examined,  and  keys  are  assigned  based  on  the  combination  of 

following  elements  which  are  attached  to  the  phosphorus  atom-  N  0  S  X 

(any  of  the  halogens:  F,  Cl,  Br,  I),  and  the  CN  group.  The  various  keys  which 

oroera  T  *  are  “sted  in  Table  XVI  in  CIDS  No.  4.  A  macro  flow  chart  of  the 
program  is  presented  in  Figure  45. 

2.4.12.2  Program  Structure 

tion  16  called  wlth  the  core  address  of  the  compound  connec- 

?bl  ln,.  address  Potion  of  the  accumulator.  If  any  keys  are  assigned 
he  corresponding  two-word  codes  are  stored  in  the  SCKY  array  and  the  count^of 
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2.5.1  Search  File  Creation  or  Update 


Code  Name:  NUFILE 

Programmer :  Peter  J.  Brown 

Abstract :  NUFILE  creates  a  search  file  of  compounds  by  simply  assigning 
each  compound  to  an  area  in  the  file  as  it  is  input  to  NUFILE.  An  existing 
file  may  be  updated  by  the  same  process.  NUFILE  simultaneously  creates  a 
tape  of  key  and  file  address  pairs  which  will  be  used  by  programs  MERGE  and 
INDEX  to  create  ao  index  to  the  complete  compound  file. 

2. 5.1.1  Program  Description 

NUFILE  may  either  create  the  initial  file  of  compounds  or  update  an  exist 
ing  search  file.  Either  function  may  be  selected  by  placing  an  appropriate 
data  card  at  the  and  of  the  deck  (described  in  Section  2. 5. 1.2) 

NUFILE  simultaneously  creates  or  updates  both  a  tape  search  file  and  a 
disk  search  file.  NUFILE  may  also  create  or  update  only  a  tape  search  file. 
Either  function  may  be  selected  by  means  of  a  sense  switch  (section  2. 5. 1.2), 
The  disk  search  file  is  actually  produced  on  tape  in  a  format  which  will  per¬ 
mit  an  easy  transference  to  a  disk  unit  and  differ?  from  the  tape  search  file 
only  in  the  blocking  of  the  data.  This  difference  causes  the  same  compound  to 
have  different  file  addresses  in  the  two  files-  The  key-addres9  tape,  from 
which  the  Index  is  formed,  must  therefore  have  two  addresses  for  each  key  o\ 
the  list. 

A  macro  flow  chart  of  this  program  is  presented  in  Figure  46* 

2. 5.1.2  Program  Structure 

NUFILE  is  a  main  program  which  requires  as  input  (l)the  new  compound 
tape(s)  and  (2)the  last  tape  of  the  existing  search  file. 

The  new  compound  file  is  the  output  of  the  final  screening  program. 

The  information  on  this  tape  ie  blocked  in  IOBS  Type  2  records.  The  last 
compound  is  followed  by  a  special  sentinel  record,  the  first  word  of  which 
is  sere,  to  signal  the  end  of  the  input  data. 

The  last  tape  of  the  existing  search  file  is  required  only  if  NUFILE 
is  being  used  to  update  an  existing  file.  The  information  on  this  tape, 
up  to  and  including  the  last  compound  record,  is  copied  onto  the  output 
tape.  Following  this  record  and  an  end-of-file  mark,  there  is  a  special 
record  containing  the  count  of  compounds  presently  in  the  file,  and  the  next 
file  address  to  be  assigned  to  a  compound.  After  this  tape  is  copied,  it  it 
rewound  and  unloaded,  and  this  tape  drive  becomes  the  alternate  unit  for  the 
search-file  output  tape. 

Two  data  cards  must  be  placed  at  the  end  of  the  deck  (in  the  follow 
ing  order) : 


(1)  A  card  contab.  ing  the  maximum  number  of  blocks  to 

placed  on  each  tape  of  the  Tape  Sea^'h  File.  If  th> 


Input  a  New 
Compound 


'Ie  it  the 
Sentinel 
Compound? 


3 


N 


^  BNP  ^ 


Place  Compound 
Place  each  Key 
the  Compound’s 
on  the  Key-Ad 

on  Tape  File. 
Followed  by 
Tape  Address, 
dress  Tape. 

j 

/is  Se 

n  se  \ 

5  / 

VJ! 

i 

y 

Place  Compound  on  Disk  File. 
Place  Compound's  Disk  Ad¬ 
dress  After  Each  Key, 
on  Key-Address  Tape. 

Figure  46.  Macro  Flow  Chart 


NUFILE 


3 


tape  is  filled  before  this  numtJfer  of  blocks  is  reached, 
tape  switching  automatically  occurs.  This  is  a  safe¬ 
guard  so  that  the  tape  may  be  copied  on  a  slightly 
shorter  tape  and  still  contain  the  same  number  of  blocks 
as  the  original.  For  2400  foot  tapes,  2400  blocks 
leaves  about  a  quarter  of  an  inch  of  turns  unused.  This 
number  is  punched  in  columns  1  through  6;  leading  zeros 
must  be  punched. 

(2)  A  card  with  either  the  word  NUFTLE  or  UPDATE  punched  in 
columns  1  through  6:  This  determines  whether  a  new  file 
is  to  be  created  of  an  existing  file  updated. 

The  output  consists  of  (1)  the  Tape  Search  File,  (2)  the  Disk  Search 
File,  and  (3)  Che  Key-Address  tape. 

The  Tape  Search  File  is  composed  of  a  series  of  tapes  with  variable 
length  block  size,  up  to  a  maximum  of  1000  words.  A  compound  record  can¬ 
not  be  split  between  two  physical  records.  The  information  on  each  tape 
is  followed  by  an  end-of-file  mark  and  a  small  (10  word)  dummy  block.  At 
the  end  of  the  last  tape  of  the  file,  following  two  consecutive  end-of-file 
marks,  is  a  special  record  10  words  long.  The  first  word  contains  the  next 
file  address  to  be  assigned  to  a  compound  in  the  format: 


Bits 

Contents 

(S , 1-5  > 

Tape  Number  (1  - 

(6-18) 

Record  Number  (0 

(19-35) 

Relative  address  (0 

) 

rhe  second  word  contairs  (right-adjusted)  the  current  number  of  compounds  fr 
the  file 

The  Disk  Search  File  contains  the  same  items  as  the  Tape  Search  File  anu 
in  the  same  order.  The  block  size  is  465  words.  Compounds  may  be  split  be¬ 
tween  two  physical  records,  but  never  more  than  two.  The  end-of-tape  and 
end-of-file  sentinels  are  the  same  as  for  the  Tape  Search  File,  except  that  the 
10  word  special  record  at  the  end  of  the  data  file  contains  the  following  infor¬ 
mation  : 


Bits  Contents 


(S,l-17)  Record  (track)  Number  (1  -  ) 

(18-35)  Relative  address  (0-464) 

Number  of  words  left  unfilled 
in  last  data  record  written  on 
the  tape  (right-adjusted). 

The  Kev-Address  Tape  is  an  I0BS  tape  of  Type  1  records  with  LRL=4 , 
P.CT=250,  block  size^lOOO.  It  contains  a  four  word  logical  record  for  each  kpv 
occurrence  or.  the  compound  input  tape  in  the  format: 


Word 

1 

2 


»• 
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Contents 


Word 


1  Key  (high  order  36  bits) 

2  Key  Uow  order  36  bits) 

3  Address  in  Disk  Search  File  of 

compound  containing  this  key 

4  Address  in  Tape  Search  File  of 

compound  containing  this  key. 

This  tape  is  used  as  the  input  to  KFYSRT. 

2-5.1. 3  Operator  Instructions 

The  new  compound  tape(s)  are  loaded  alternately  on  S.SU04(C4)  and 
S.SU06(C5),  Output  tapes  for  the  Tape  Search  File  are  loaded  alternately  on 
S.SUIO(BI)  and  S.SU07(B6).  Output  tapes  for  the  Disk  Search  File  are  loaded 
alternately  on  S.SU19(C6)  and  S.SU02(C3).  For  Update  runs  the  last  tape  for 
the  previous  Tape  File  is  lo*»<.,'d  n  the  first  output  unit  for  the  Tape  File. 
The  first  output  tape  is  then  los’ed  on  the  alternate  unit.  If  a  Disk  File 
is  to  be  updated  the  last  tape  of  the  previous  Disk  File  must  be  loaded  on  the 
first  Disk  File  output  unit.  The  Key-address  tape(s)  are  Loaded  on  S.SU05  (B5). 


2.5.2  Kev-Address  Sort 


Code  Name:  KEYSRT 

Programmer;  Peter  J.  Drown 

Abstract;  KEYSRT  sorts  the  Key-Address  tape  which  is  output  from  program 
NUFILE.  The  key-address  pairs  are  sorted  in  ascending  order  according  to 
key  number.  It  maintains  the  ascending  order  of  addresses  as  they  are  produced 
by  NUFILE. 

2. 5. 2.1  Program  Description 

KEYSRT  is  an  IBSRT  program.  It  is  a  logical  =ort  wherein  the  sign  bit 
of  a  word  is  considered  the  high  order  bit  of  that  word.  The  option  EQUALS 
insures  that  In  the  case  of  a  tie  (i.e.,  two  keys  being  the  same),  they  are  put 
on  the  output  tape  in  the  same  order  as  they  appeared  on  the  input  tape.  This 
keeps  the  addresses  corresponding  to  these  keys  in  ascending  order. 

2. 5. 2. 2  Program  Structure 

KEYSRT  is  a  main  program  which  requires  as  on  Input,  the  Key -Address 
tape  which  is  output  from  NUFILE.  The  output  produced  is  the  Sorted  Key • Add re ^ 
tape  which  is  in  the  same  format  as  the  input  tape.  This  tape  necomes  tre  .ire. 
to  program  MERGE. 

2. 5. 2. 3  Operator  Instructions 

The  input  tape  must  be  mounted  on  unit  B5  (S.SU05),  and  the  output  tip¬ 
is  mounted  on  unit  C A  (S.SU04),  This  is  an  order  four  sort.  This  requires.  . 
addition  to  the  input  and  output  units,  eight  scratch  units,  four  on  each 
channel.  If  there  are  not  enough  units  available,  there  are  two  alterraciv* 

(1)  If  short  by  one  unit,  the  input  unit  can  be  used  as  a 
scratch.  Push  START  when  IBSRT  requests  another  unit. 

Then,  when  the  information  on  the  input  tapes  have  been 
read,  IBSRT  will  request  that  a  scratch  be  mounted  and 
readied  on  that  unit.  The  output  unit  can  be  utilized 
similarly  -  before  the  last  phase  of  the  sort,  IBSRT 
will  indicate  the  unit  on  which  the  final  sorted  infor* 
mation  will  be  placed. 

(2)  The  other  alternative  is  to  select,  on-line,  a  lower 
order  sort.  This  is  standard  operating  procedure. 

IBSRT  will  request  that  tt  -  '  •  r  of  input  reels  be  "keyed"  in. 


2.5.3  Key-Address  Merge 


Code  Name:  MERGE 

Programmer:  Peter  J.  Brown 

Abstract :  MERGE  combines  the  Sorted  Key-Address  tape  (see  NUFILE  and 
KEYSRT)  with  the  Old  Merged  Key-Address  tape  containing  all  the  keys  in  the 
file  (prior  to  the  present  run)  and  the  addresses  of  their  occurrences.  This 
combination  results  in  a  New  Merged  Key-Address  tape,  which  is  an  inverted  file 
(by  keys)  of  all  the  compounds  in  the  file.  This  tape  is  the  input  to  program 
INDEX,  which  creates  a  three  level  key-t 0-compound  Locater  Table. 

2. 5. 3.1  Program  Description 

MERGE  groups  together  all  the  Search  File  addresses  paired  with  the  same 
key  on  the  Sorted  Key-Address  tape.  These  addresses  are  written  on  the  New 
Merged  Key-Address  tape  in  one  list,  with  the  key  present  only  at  the  head  of 
the  list.  This  is  illustrated  in  the  following  example: 

Sorted  Key-Address  Tape  New  Merged  Key-Address  Tape 


Key  1 
Address 
Key  1 
Address 
Key  2 


II 


Key  1 
Address  I 
Address  II 
Key  2 


During  update  runs,  MERGE  combines  the  Old  Merged  Key-Address  tape  with 
the  Sorted  Key-Address  tape  to  produce  a  New  Merged  Key-Address  tape. 

2. 5.3.2  Program  Structure 

MERGE  is  a  main  program  which  require.,  as  input  (1)  the  Sorted  Key-Address 
tape  and  (2)  the  Old  Merged  Key-Address  tape.  The  format  of  the  Sorted 
Key-Address  tape  is  described  in  Section  2. 5. 1.2. 

The  Old  Merged  Key-Address  tape  is  present  for  Update  runs  only. 

No  tape  is  needed  when  creating  a  new  Index.  This  tape,  which  is  the 
output  of  program  MERGE,  is  blocked  in  variable  length  records,  maximum 
block-size  1000  words.  A  logical  record  consists  of  a  key  and  its  occurrences, 
and  thus  it  varies  in  size.  At  the  end  of  each  tape  is  an  end-of-file  mark 
followed  by  a  dummy  block.  Two  consecutive  end-of-file  marks  terminate  the 
last  tape.  The  following  is  the  logical  record  format: 


l77 


Concents 

Key  (1st  half) 

Key  (2nd  half) 

Disk  File  Address  of  1st  compound  containing  this  key 
Tape  File  Address  "  "  "  " 


Disk  File  Address  of  last  compound  containing  this  key 

Tape  File  Address  "  "  " 

Word  of  zeros  (end  of  list  sentinel). 

The  program  produces  as  output  the  New  Merged  Key-Address  tape.  This 
tape  has  the  same  format  as  the  Old  Merged  Key-Address  tape. 

2. 5. 3. 3.  Operator  Instructions 

When  running  MERGE,  the  Sorted  Key-Address  tape(s)  should  be  loaded 
alternately  on  units  S.SU04  (C4)  and  S.SU06  (C5).  For  an  Update  run,  the  01c 
Merged  Key-Address  tape(s)  should  be  loaded  alternately  on  S.SU05  (B5)  and 
S.SU07  (B6)  .  The  New  Merged  Key-Address  tape(s)  should  be  loaded  alternate!) 
on  S.SU10  (Bl)  and  S.SU19  (C6) . 

The  on-line  typewriter  will  request  that  the  number  of  reels  be  keyed 
in.  The  operator  must  key  in  the  number  of  Old  Merged  Key-Address  tape  reels. 
For  a  'nufile'  run,  for  which  there  is  no  Old  Merged  Key-Address  tape,  key  in 
zero  reels. 


Word 


2n- 1 
2n 

2n+l 


2.5.4  Index  creation 

Code  Ns  me :  INDEX' 


Programmer:  Peter  J.  Brown 

Abstract:  The  key "to -compound  locater  table,  used  by  the  CIDS  Retrieve 
System,  is  created  by  program  INDEX  from  the  inverted  key  list  produced  by 
program  MERGE, 

2. 5. 4.1  Program  Description 

INDEX  converts  the  inverted  key  list  (output  from  program  MERGE) 
into  a  three  level  key- to- compound  locater  table.  The  inverted  key  list, 
as  it  appears  on  the  Merged  Key-Address  tape,  contains  each  key  in  the  system, 
followed  by  the  Tape  Search  File  and  Disk  Search  File  addresses  of  all  corres¬ 
ponding  compound  records.  From  this,  a  Locater  Table,  or  Index,  may  be  created 
for  either  the  Tape  File  or  the  Disk  File.  Either  option  may  be  selected 
by  placing  an  appropriate  parameter  card  at  the  end  of  the  program  deck. 

INDEX  places  each  address  list  from  the  inverted  key  list  onto  a  new  file 
(called  the  List-of-Addresses) ,  which  differs  from  the  original  in  two  ways; 

The  "header"  key  of  each  list  is  removed,  and  each  list  is  reduced  to  addresses 
of  compounds  in  only  one  of  the  two  search  files.  Each  header  key,  together 
with  the  location  on  the  List-of-Addresses  of  the  list  corresponding  to  this 
key,  is  placed  on  a  second  new  file  called  the  Key-Address  List.  Both  of  thse 
files  are  produced  on  tape,  but  are  blocked  in  465  word  records  to  permit  easy 
transference  to  a  disk  unit.  The  last  key  in  each  record  in  placed  on  a  third 
file,  called  INDX,  along  with  its  track  (or  record)  number.  This  file  is  small 
and  is  loaded  into  core  at  search  time. 

Figure  47  illustrates  the  construction  of  a  three-level  Tape  File  Index 
from  the  Merged  Key-Address  Tape.  In  the  diagram,  D  represents  the  Disk  File 
address  and  T  represents  the  Tape  File  address.  Aj  represents  the  address  o- 
the  List-of-Addresses  level  of  the  Index  for  those  addresses  corresponding 
to  Keyj. 

2. 5. 4. 2  Program  Structure 

INDEX  is  a  main  program  which  requires  as  input  the  Merged  Key-Address 
tape,  which  is  output  from  program  MERGE.  Its  format  is  described  in 
Section  2. 5. 3. 2.  The  program  produces  as  output  (1)  the  List-of-Addresses 
tape  and  (2)  the  Key-Address  List.  The  latter  tape  also  contains  the  INDX 
level  of  the  Index. 

The  List-of-Addresses  Tape  is  blocked  in  1000  word  records.  Each  list  .■> 
followed  by  a  word  of  zeros.  Each  tape  is  terminated  by  an  end-of-file  mark 
and  a  dummy  block  (10  words).  The  last  tape,  however,  is  terminated  by  two 
consecutive  end-of-file  marks.  The  addresses  within  these  lists  are  in 
ascending  order.  If  the  index  to  the  Disk  Search  File  was  selected,  the 
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Merged  Key  “Address  Tope: 


f  ^ 


lost  key  or  track  I  of  the  Key'Addroes  List 


Figure  47. 


Const- ruction  of  Three-Lovel  Index 


format  of  each  address  is: 

Bits  (a,l-17):  Track  No,  (1-  ) 

(18-35):  Relative  Address  (0-464) 

If  the  Index  to  the  Tape  Search  File  was  selected,  the  format,  of  each 
address  is  : 

Bits  (s,l-5):  Tope  No.  (1-  ) 

(6-18):  Record  No  (0-  ) 

(19-35):  Relative  Address  (0-  ) 

The  Key-Address  List  coitains  each  key  (2  words),  coupled  with  the 
address  of  its  corresponding  list  on  the  List-of-Addresses  tape.  The  format 
of  this  address  is: 

Bits  (s,l-17):  Track  No.  (0-  ) 

(18-35):  Relative  Address  (0-464) 

Since  each  logical  record  is  three  words,  there  can  be  as  many  as  153  ke\? 
ov  r  track.  The  last  key  on  this  file  is  a  special  scnunti  key  (all  1  bus) 

Each  Key-Address  List  tape  is  terminated  by  an  end-of-file  mark  and  a 
vlitnmy  block  except  that  the  last  tape  is  terminated  by  two  consecutive 
end-of-file  marks.  The  third  file,  INDX,  is  then  placed  on  this  tape,  directj. 
following  the  two  end-of-file  marks.  This  file  is  always  small  enough  to  plac* 
on  tape  in  one  block,  because  1000  words  would  accomodate  a  file  containing 
over  50,000  different  keys.  This  block  is  then  followed  by  an  end-of-file 
mark. 


The  last  key  to  appear  on  INDX  will  naturally  be  the  sentinel  key,  i i:t  - 
tne  last  track  on  the  Key-Address  List  may  not  be  completely  filled,  the 
address  corresponding  to  the  sentinel  key  will  not  necessarily  imply  word  4--. 
on  that  track.  The  logical  :ecord  format  of  INDX  is* 

Word  1:  Key  (1st  half) 

Word  2:  Key  (2nd  half) 

Word  3:  Track  (1-),  contained  it:  bits  (s ,  1  —  17) 


Program  INDEX  also  provides  a  printer  listing  consisting  of  each  k.o 
present  in  the  Search  File  and  its  number  of  occurrences  in  the  file  (the 
number  of  times  the  key  was  assigned).  The  relative  address  of  the  list  c 
Search  File  addresses  corresponding  to  each  key  is  also  provided.  This  ke> 
listing  is  a  helpful  tool  in  analyzing  the  usefulness  of  the  keys  in  the 
system  and  to  some  extent  the  nature  of  the  compounds  in  the  file.  It  pro 
vides  to  the  querist  an  upper  limit  on  the  number  of  responses  he  can  expe-.-t 
from  a  query.  It  can  also  be  an  aid  in  planning  search  strategy,  in  re¬ 
ducing  the  number  of  keys  that  must  be  utilized  for  optimum  retrieval  ft: 
a  query 
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1 


2. 5. 4. 3 


Operator  Instructions 


The  Merged  Key-Address  tape(s)  are  loaded  alternately  on  S.SU10  (Bl) 
and  S.SU19  (C6).  The  Lia t*of-Addresaes  tape(s)  are  loaded  alternately  or. 
S.SU04  (C4)  and  S.SU06  (C5).  The  Key-Address  List  tape  is  loaded  on 
S.SU05  (B5) . 

A  card  with  the  word  TAPE  or  DISK  punched  in  columns  1-4  must  be  placed 
at  the  end  of  the  program  deck.  This  card  determines  whether  an  index  to 
the  Tape  Search  File  or  the  Disk  Search  File  is  to  be  created. 

The  typewriter  console  will  request  that  the  number  of  input  reels  be 
keyed  In. 
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FILE  SEARCH  AND  RETRIEVAL 


This  section  describes  the  programs  which  accomplish  the  actual  file 
search  and  retrieval  portion  of  the  CIDS  system.  This  process  includes  three 
separate  and  distinct  actions,  the  input  and  preprocessing  of  the  query,  the 
file  search  and  the  output  of  retrieved  records.  There  are  two  systems  in¬ 
volved,  the  batch  search  system  which  is  described  by  Figure  48  and  the  on¬ 
line  search  system  which  is  described  by  Figure  49.  These  are  similar  in  most 
respects  and  are  described  more  fully  in  Section  1. 

.3.1  QUERY  PREPROCESSING 

The  following  programs  read  in  the  queries,  do  all  the  necessary  trans¬ 
lation  from  external  query  formats  to  internal  formats,  and  intersect  the 
lists  of  addresses  for  keys  as  specified  in  the  query.  They  provide  the 
accession  list  of  actual  records  to  be  searched  if  the  query  specifies  addi¬ 
tional  requirements  that  entail  a  molecular  formula  search  or  an  atom-by-ator 
search  of  the  connection  table. 


/ 


t'igi/re.  48  Batch  Search  Syste-- 
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3.1.1  Query  Input  Executive 
Code  Name:  INPUT 


Programmer ;  Richard  Haber 

Abstract:  Program  INPUT  is  used  to  keep  track  of  the  number  of  queries 

correctly  entered  in  the  system.  It  also  stores  the  disk  address  of  each 
query  in  the  query  disk-core  table. 

3. 1.1.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  50. 

Program  INPUT  is  first  used  to  open  the  output  file  on  which  the  acces¬ 
sion  list  is  written  by  program  KIAD.  It  then  gives  control  to  program  EXECV 
which  is  used  to  preprocess  a  query. 

When  control  is  returned  from  EXEC30,  INPUT  checks  to  determine  whether 
the  query  ha9  been  accepted  or  not.  If  the  query  has  been  accepted,  a  word 
containing  the  number  of  queries  is  incremented  by  one,  and  the  disk  address 
of  the  query  is  stored  in  the  query  disk-core  table.  INPUT  then  determines  if 
more  queries  can  be  preprocessed. 

If  more  queries  can  be  preprocessed  or  if  the  previous  query  has  been 
thrown  out,  INPUT  again  gives  control  to  EXEC30  to  preprocess  another  query. 
When  no  further  queries  can  be  preprocessed,  control  is  given  to  program  READ 
which  closes  the  output  file  on  which  the  accession  list  is  written. 

The  number  of  queries  which  may  be  preprocessed  depends  upon  whether  the 
system  is  operating  in  batch  or  real-time  mode.  At  present,  only  one  query  at 
a  time  may  be  preprocessed  (and  then  processed)  when  the  real-time  system  i» 
used.  Up  to  2000  queries  may  be  preprocessed  at  a  time  when  the  system  is 
operating  in  batch  mode. 

3. 1.1.2  Program  Structure 

INPUT  is  initially  given  control  by  program  M0NIT0.  It  normally  calls 
EXEC30  which,  in  turn,  calls  READ.  Control  is  returned  to  INPUT  from  READ 
via  EXEC30  unless  a  $  (signalling  the  end  of  queries!  is  encountered  in  colum: 
one  of  any  input  line  (either  a  line  of  teletype  or  a  punched  card).  In  fei-. 
case  READ  retains  control  and  INPUT  is  not  reentered. 


Code  Name:  EXEC3C 


Programmer:  Paul  Weinberg 

Abstract:  The  Query  Preprocessor  is  a  set  of  programs  that  scans  the 
source  text  of  a  query  presented  by  the  user  and  translates  it  into  the  inter¬ 
nal  coding  of  the  retrieval  system.  Queries  are  checked  for  syntactical  er¬ 
rors  and  edited.  In  this  role,  the  Query  Preprocessor  communicates  with  the 
other  programs  of  the  CIDS  system  to  allow  the  system  to  adjust  to  particu¬ 
lar  user  requirements. 


3  1,2.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  51. 

The  entire  preprocessor  is  arranged  in  one  deck,  EXEC30.  Chains  of  core 
and  disk  storage  are  located  in  deck  BUFFRS,  There  are  three  parts  to  the 
preprocessor. 

To  utilize  storage  in  an  efficient  manner,  a  set  of  subroutines  manip¬ 
ulates  the  lists  of  core  and  disk  storage  used  by  the  preprocessor  and  real¬ 
time  monitor.  The  technique  involves  chains  of  buffer  control  words,  each  con¬ 
trol  word  representing  a  block  of  storage.  The  subroutines  which  perform 
these  functions  are: 

(1)  POPTOP  (CHAIN,  HOL)  removes  the  "top"  buffer  from  the  chain 
named  CHAIN  and  presents  the  address  of  the  control  word  for  that 
buffer  in  HOL.  If  the  chain  is  empty,  0  is  returned  to  HOL. 

(2)  ADDBUF  (CHAIN,  EOL)  adds  the  buffer  whose  control  word  is  ind.i 
cated  by  Che  address  part  of  E0L  to  the  chain  named  CHAIN 

(3)  MVECHN  (CHAIN,  H0L)  adds  the  entire  list  of  buffers  starting 
with  the  control  word  indicated  in  H<fl  to  the  end  of  the  chain  nameu 
CHAIN. 

A  number  of  macros  is  available  to  generate  chain  structures.  A  chain 
should  be  formed  for  each  different  size  of  buffers  used.  Chains  are  used 
to  control  disk  as  well  as  core  storage  pools. 

A  second  set  of  subroutines  (SCAN)  is  used  to  scan  the  query  text,  extract 
ing  symbols  (strings  of  non-blank  characters  or  a  delimeter  character)  in  or¬ 
der. 


A  third  set  of  subroutines  is  included  to  interpret  the  query  statements 
as  they  are  scanned,  and  set  up  the  block  of  information  that  will  represent 
the  query  internally  for  the  retrieval  system. 

The  basic  mechanism  of  the  preprocessor  is  to  scan  the  input  text  until 
a  symbol  has  been  collected.  A  table  (known  as  the  dispatcher  or  scan  table) 
is  then  consulted  to  transfer  to  a  routine  wtiich  interprets  the  symbol.  The 
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scan  table  (Tabic  I)  lists  different  transfer  points  depending  on  the  part 
of  the  query  in  which  the  scanned  symbol  has  been  detected.  Consulting  this 
table,  for  example,  the  symbol  STRUCTURE  causes  the  routine  IS  to  be  entered 
if  it  is  encountered  while  scanning  an  INPUT  string  but  will  cause  the  rou¬ 
tine  D.DFST  to  be  entered  if  it  is  encountered  while  scanning  a  DEFINE  state¬ 
ment. 


Each  routine  may  call  on  SCAN  to  remove  successive  symbols  from  the  input 
text  and  process  them.  At  its  conclusion  the  routine  returns  to  the  main 
scanning  mechanism  with  a  new  assumed  location  for  the  input  scan. 

When  an  END  statement  has  been  encountered,  the  preprocessor  assembles 
the  sect'ons  of  the  query  that  have  been  collected  from  individual  state¬ 
ments.  Routine  KIAD  is  called  to  compute  an  accession  list  and  the  query 
is  formatted  for  searching.  The  query  is  then  written  onto  a  disk  buffer 
and  control  is  returned  to  the  search  programs  which  will  read  the  query  back 
in  when  searching  starts. 

The  input  accepted  by  the  preprocessor  is  described  in  a  separate  doc¬ 
ument  describing  the  query  language.  The  internal  format  for  queries  is  out¬ 
lined  in  Table  II. 

3. 1.2.2  Program  Organization 

EXEC30  is  a  subroutine  which  is  railed  by  a  standard  CALI,  statement. 

Its  entry  name  is  D.STRT. 

Input  to  EXEC30  is  the  BCD  query  text  stored  in  a  buffer  whose  location 
is  given  in  D.INBF. 

The  output  is  the  query  in  internal  format  stored  on  disk, 
disk  buffers  holding  output  has  the  name  OUCUAN. 

The  Query  Processor  generates  the  following  messages: 

1.  ILLEGAL  QUERY  CXXXXX. 

The  first  symbol  in  the  query  cannot  be  used  as  a  query  identi¬ 
fier.  * C '  identifies  the  consol.  .  XXXXX  is  the  bad  symbol.  Th< 
query  is  skipped. 

2.  UNABLE  TO  LOCATE  QUERY 

The  query  preprocessor  cann-  i  find  a  syntactically  correct  quer  • 
in  the  input  text  presented. 

3.  COMMENTS  FOR  QUERY  CXXXXX. 

A  query  has  been  found  and  comments  regarding  it  follow.  'C 
Identifies  the  console.  XXXXX  is  the  query  name. 

A.  ASSUMED  PRIORITY  OR  O. 
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5 .  ILLEGAL  STRING  SENTINEL  XXXXXX 


In  looking  for  a  statement  head,  the  symbol  XXXXXX  has  been  en¬ 
countered  and  is  not  a  legal  statement  name.  The  symbol  is  ignored. 

[KEYS  ) 

6.  REPETITION  TABLE  OVERFLOW  SCANNING  (STRUCTURE j 

Too  many  (more  than  36)  defined  keys  or  structures  appear  in  a 
logical  statement. 

7.  UNDEFINED  SYMBOL  XXXXX  DETECTED  DURING  SCAN  OF  LOGICAL  STATEMENT 

(KEYS  \ 

( STRUCTURE ) 

A  symbol  which  lias  not  been  defined  in  a  DEFINE  statement  has 
been  used  in  a  logical  statement.  The  symbol  is  ignored. 

( KEYS  ! 

8 .  OVERFLOW  OF  MINTERM  TABLE  FOR  \  STRUCTURE  J 

More  than  463  'OR*  connectives  appear  in  a  logical  statement. 

The  statement  Is  Ignored. 

9.  SYNTAX  ERROR  IN  KEY  XXXXXX 

General  purpose  error  message  in  program  that  converts  CIDS  keys 
(appearing  in  DEFINE  statements)  to  the  internal  formRt  of  the  search 
system. 

10.  SYNTAX  ERROR- IGNORE  XXXXXX 

Symbol  XXXXXX  appears  out  of  place  and  has  been  ignored. 

11.  OVERFLOW  OF  DEFINITION  AREA 


A  slash  is  missing  in  a  DEFINE  statement. 

12.  SKIPPING  FORMULA 

Used  to  indicate  a  syntax  error  in  a  FORMULA  statement  that  has 
caused  it  to  be  skipped. 

13.  SKIPPING  TO  END  OF  QUERY 

A  catastrophic  error  in  the  specification  of  a  query,  typically 
the  omission  of  a  logical  expression  for  keys,  has  caused  the  entire 
query  to  be  skipped. 

(KEYS  \ 

14.  NO  DEFINED  (STRUCTURE J  SKIPPING  LOGICAL  EXPRESSION 


3.1.3 


Query  Reader 


Code  Name:  RFAD 

Programmer :  Ftchard  Haber 

Abstract :  Program  READ  Is  used  to  read  queries  from  an  input  device.  It 
can  be  used  to  read  either  punched  cards  or  punched  paper  tape  produced  by  a 
teletype . 

3. 1.3.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Fi  ire  52 

Program  READ  is  called  by  program  EXFC30  tc  read  a  auery  from  an  Input 
device  and  place  it  in  a  buffer  for  processing  bv  EXEC30,  It  can  read  either 
punched  cards  or  punched  paper  tape,  and  will  read  an  entire  query  provided  the 
query  is  less  than  465  words  long. 

When  called  by  EXEC30,  READ  begins  reading  a  query  one  line  at  a  time.  If 
punched  paper  tape  is  being  read,  a  line  is  considered  to  be  one  line  on  the 
teletype  which  produced  the  tape.  Tf  punched  cards  are  being  used,  one  line  is 
equivalent  to  one  card. 

The  query  is  read,  one  line  at  a  time,  until  one  of  the  following  three 
conditions  occur: 

(a)  END  occurs  in  columns  1-3  of  a  line.  This  signifies 
that  the  entire  query  has  been  read.  The  query  is 
now  in  a  buffer  and  control  is  returned  to  the  call¬ 
ing  program. 

(b)  465  words  of  the  query  have  been  read.  Since  a 
single  buffer  contains  room  for  onlv  465  words,  con¬ 
trol  is  returned  to  EXEC30  which  prepares  another 
buffer  for  the  remainder  of  the  querv.  READ  is  then 
called  again  bv  EXFC30  to  continue  reading  the  ouerv . 

(c)  A  S  occurs  in  column  1  of  a  line.  This  signals  the 
end  of  all  of  the  queries.  In  this  case,  a  dummv 
record  is  written  to  end  the  accession  list  produced 
by  program  KIAD.  The  file  containing  the  accession 
list  is  closed  and  is  now  ready  to  be  sorted.  The 
querv  nreprocess Ing  has  been  finished  and  the  re¬ 
trieval  svstem  is  ready  to  begin  searching  the  file. 

3. 1.3. 2  Program  Structure 

READ  is  a  subroutine  called  bv  EXFC'JO  as  described  above.  j(  -a-  I  ...... 

be  transferred  to  bv  program  n, pit  when  Lb. -it  program  tiaa  d»  tejminod  that  no 
further  queries  mav  be  read.  In  t*e  latte  ..jc,  the  accession  '  '-t  is  ter¬ 
minated  and  tile  file  losed  a-  described  above  in  case  f  c  )  ,  and  the  re 
triev.  I  system  rt-.uiv  to  search  the  file. 


Figure  52.  Macro  Flow  Chart  -  READ 
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Any  number  of  END  statements  may  be  placed  at  the  end  (or  the  begin¬ 
ning)  of  a  query.  However,  an  error  message  will  be  given  If  the  $  stateme> 
does  not  directly  follow  an  END  statement.  The  last  query  will  not  be  pre- 
processed  correctly  in  this  case,  but  nc  previous  queries  will  be  affect-id. 


3.1.4  Key-Expression  to  Accession  List  Processor 

Code  Name:  KIAD 

Programmer:  James  W.  Gerber 

Abstract:  This  program  accepts  the  boolean  key  expression  as  input  and 
produces  the  list  of  all  compound  record  addresses  that  pass  the  key  expres¬ 
sion  (the  accession  list.) 

3. 1.4.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  53. 

KIAD  (Key-Index  Address  Decoder)  is  a  subroutine  which  produces  the 
accession  list  for  a  query.  When  it  is  called  the  accumulator  contains  the 
following  information: 

bits  21-35:  address  of  first  word  of  key  definitions 
bits  3-17:  address  of  last  word  of  minterms 

KIAD  produces  pairs  of  words  on  the  output  unit  (defined  in  file  defin¬ 
ition  card  FILE1)  with  the  following  format: 

Word  1:  Query  number  (internal)  from  external  location  INDEX 
Word  2:  Tape  or  disk  address 

The  file  is  opened  and  closed  by  the  search  executive. 

KIAD  uses  the  index  produced  by  program  INDEX.  This  index  is  in  three 
parts.  The  first  part,  the  list  of  addresses,  is  loaded  onto  S.SU15  (disk) 
and  the  second,  the  Key-Address  list,  is  loaded  onto  S.SB14.  The  third  part 
of  the  index  must  be  loaded  into  core  memory  at  location  INDX.  This  table 
is  1000  words  or  less.  Table  INDX  is  external  to  KIAD. 

There  are  two  versions  of  program  KIAD  at  present.  The  only  difference 
between  them  lies  in  the  handling  of  the  output  file  definition.  The  tape 
system  version  has  the  file  defined  with  a  FILE  pseudo-op  card  in  KIAD.  In 
the  Pseudo-realtime  system  (disk  system)  the  file  definition  is  contained  in 
a  $FILE  card  in  the  main  link  and  is  thus  external  to  KIAD. 

KIAD  makes  use  of  the  routines:  ADDBUF,  POPTOP,  MVECHN.  These  routines 
allow  dynamic  disk-buffer  storage  allocation  and  are  used  to  produce  inter¬ 
mediate  storage  of  partial  results.  The  name  of  the  free  storage  chain  is 
(external)  DISKBF.  This  chain  should  contain  enough  buffers  to  accept  the 
longest  list  produced  by  KIAD.  This  list  length  is  the  longer  of: 

(1)  The  length  of  the  longest  of  the  accession  lists  produced 
by  the  first  literal  in  each  minterm 

(2)  The  total  length  of  the  accession  list  on  exit  from  KIAD. 

The  Input  format  Is  described  in  Table  II. 


Convert 

Query 


Figure  53.  Macro  Flow  Chart  •  Ki ->j 
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KTAD  first  translates  the  Kev  Definition  table  as  follows: 

1.  The  word  In  core  following  'he  last  minterm  Is  replaced  bv  n 
(This  word  will  be  restored  before  KTAD  returns  to  its  caller! 

2.  The  list  of  key  definitions  is  transmitted  to  KEVLST 

3.  The  Key  Index  is  unpacked  from  2  to  a  word  representation  to 
1  to  a  word  and  the  result  is  transmitted  to  KEYIND 

4.  Pointer  is  set  up  which  points  to  the  minterms  in  location 
MINTRM. 

KIAD  then  proceeds  to  process  the  query.  Each  minterm  is  handled 
separately  and  the  result  is  merged  with  the  list  from  the  previous  minterm 
(if  any)  by  routine  MERGE.  KIAD  intersects  lists  within  each  minterm  by 
calling  routine  INTSEC.  INTSEC  examines  location  LITSON  to  determine  the  sign 
of  the  current  literal.  LITSGN  will  be  plus  if  the  literal  is  nonnegated. 

The  result  of  processing  a  minterm  is  left  in  disk  buffer  PARTMT.  During 
intersection,  the  new  result  is  in  buffer  chain  NWPTMT.  MERGE  accepts  the 
results  of  the  minterm  in  PARTMT  and  the  list  in  PARTMG  and  produces  a  list 
containing  the  union  of  the  addresses  of  both  lists  in  PARTMG.  It  uses  NWPMG 
during  merging  to  9tore  the  intermediate  result.  INTSEC  accepts  the  list  in 
PARTMT  and  intersects  it  with  the  list  for  the  key  whose  location  is  in 
location  LISTAD.  INTSEC  expects  to  find  the  number  of  occurrences  for  the  key 
in  location  NUMOCC.  This  will  be  set  by  GETKEY,  the  routine  that  finds  a 
key  in  the  index  and  sets  ud  a  pointer  to  it  in  LISTAD,  GETKEY  will  print  a 
message  if  the  key  is  not  in  the  index  (i.e.  it  has  not  been  assigned  to  anv 
compound  on  file.  The  processing  of  the  query  will  continue  as  if  the  list 
for  that  key  were  present  out  had  no  items  in  it.  If  the  key  appears  non¬ 
negated  in  a  minterm,  that  minterm  will  produce  no  items  for  the  accession 
list.  If  the  key  appears  negated,  it  will  have  no  effect  on  the  Items  in  the 
accession  list. 

No  minterm  may  consist  only  of  negated  literals.  If  such  a  minterm  is 
encountered  by  KIAD,  It  will  be  treated  as  if  its  first  key  (leftmost  bit  in 
minterm  word)  were  nonnegated.  Thus,  the  following  auerv: 

NOT  K1  and  NOT  K2  and  NOT  K3 
Would  be  translated  as: 

K1  and  NOT  K2  and  NOT  K3 

The  order  of  negated  and  nonnegated  literals  in  the  query  is  unimportant. 

KIAD  treats  the  multiple  occurrence  of  a  key  to  mean: 

"n  or  more  of  ...key" 

thus,  NOT  3K5  means: 

NOT  5  or  more  of  K5  (i.e  none,  one,  two,  three,  or  four  of  this  kev). 


y 


.  * 
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The  only  limitation  on  the  length  of  the  qu’rv  is  that  only- 36  kev  definitions* 
are  allowed  and  that  the  minterm  list  be  on  one  disk  track  (i.e.  that  it  all  be 
In  core  memory  at  once  when  KIAD  is  called).  This  limits  the  number  of  minterms 
to  a  maximum  of  approximately  175-200.  There  is  no  limitation  to  the  length  of 
address  lists  that  KIAD  can  handle  excent  for  the  number  of  buffers  available 
in  the  disk  buffer  chain  DISKBP  at  the  time  KIAD  is  entered.  KIAD  returns  all 
buffers  used  to  the  free  chain  as  soon  as  they  are  net  needed  bv  the  program. 

>si  exit,  KIAD  has  returned  all  buffers  used  during  its  processing. 

3. 1.4. 2  Program  Structure 

KIAD  is  a  subroutine  whose  input  is  the  key  expression  part  of  the  ouerv 
(see  ouery  internal  formats  Table  II ). 

The  output  is  the  accession  list. 
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3.1.5  Key  Packing  Program 
Code  Name :  PACKEL 


Programmer :  James  W.  Gerber 

Abstract :  This  program  translates  the  Individual  key  names  froih  query 
language  format  to  Internal  format. 

3, 1.5.1  Program  Description 

A  macro  flow  chart  describing  this  program  Is  presented  in  Figure  54. 

PACKEL  is  a  subroutine.  When  It  is  called,  the  accumulator  contains  the 
location  of  the  first  word  of  the  key  definition  as  scanned  by  the  executive 
and  the  number  of  words  in  the  key  definition.  PACKEL  returns  the  key  In  the 
accumulator.  If  a  syntactical  error  is  found  in  the  key  definition,  the  accu¬ 
mulator  is  set  to  zero. 

The  possible  key  formats  for  input  to  PACKEL  are  bb  follows  (It  is  to  bp 
understood  that  the  key  in  core  has  been  scanned  by  the  standard  CIDS  input 
manner) : 

FRAGMENT  KEYS:  FRAG  aaaaaa  bbbbbb  or  FRAG  aaaaaa 

aaaaaa  is  the  first  part  of  the  key  and  bbbbbb  is  the  second  part 
of  the  key  and  may  be  omitted  if  zero. 

RING  MOLFORM  KEYS:  RINGMF  at  count  at  count  . . . 

SKELETON  MOI.^CRM  KEYS-  SKKLMF  eo..«t  at  pount  ... 

at  is  the  element  kind  and  is  either  one  or  two  characters. 
count  is  the  number  of  that  element  and  must  be  explicitly  mentioned 
even  if  one.  At  least  one  space  must  follow  the  element  kind  and 
there  must  be  at  least  one  blank  between  each  element  kind  which 
f o 1 lows . 

There  may  be  only  three  elements  mentioned  which  are  not  one  of  these: 

C,  N,  0,  S,  P. 

The  total  count  of  all  elements  not  specifically  mentioned  in  RINGMF1 8 
and  SKELMF'a  as  specified  hetero  elements  should  be  Included  in  the 
count  under  the  element  kind  UH  (unspecified  hetero).  UH  counts  as 
one  of  the  three  elements  mentioned  above.  There  is  no  restriction 
on  the  order  of  elements  in  the  input,  except  that  C,  N,  0,  S,  P  must 
precede  other  elements.  PACKEL  reorders  the  elements  to  correspond 
with  the  assigned  order. 

REDUNDANT  NUMERIC  RING  POPULATION  KEYS:  RNRP  n.  or  RNRP  n,  ,  n0 ,  n,, . . .n 

where  m  is  less  than  or  equal  to  17.  Each  of  the  n's  may  be  any  number 
greater  than  or  equal  to  3,  If  a  number  greater  than  17g  (15, q)  is  men¬ 
tioned,  it  will  be  translated  into  the  number  1  which  is  the  code  used  for 
rings  with  more  than  15  stoms.  The  order  of  the  n's  should  be  Increasing 
(i.e.  is  leo9  than  or  equal  to  n^+^). 


202 


Get  o  di¬ 
gits  &  in¬ 
sert  in¬ 
stead  of 
fol lowing 
blanks 


Convert  an 
element  & 
count 


Any 

More? 


RETURN 


Convert 

Store 

! 

1  a  i 

Convert 

r,  N.  0,  S,P 

number 

Subtype 

Convert 

count 


Put 

together 
in  key 


Figure  54.  Macro  Flow  Chart  -  PACKEL 
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atoms .  The  order  of  the  n'e  should  be  increasing  (i.e.,  n,  is  less 
than  or  equal  to  n,  ).  * 

K*r  1 

COUNT  KEYS:  NUMBER  s  count 

e  is  the  subtype.  It  must  be  explicitly  mentioned,  even  if  0.  It 
is  one  digit  except  for  subtype  10.  count  is  the  number  of  this  partic¬ 
ular  feature  and  should  be  typed  with  leading  zeros  omitted.  A  count  of 
0  should  be  explicitly  mentioned. 

MOLFORM  KEYS:  MOLFRM  el  count 

el  is  the  element  kind  and  is  one  or  two  characters. 
count  is  the  number  of  ci.»r  element  and  is  typed  with  leading 
zeros  omitted.  A  count  of  0  is  used  only  for  those  elements 
which  do  not  have  molform  keys  assigned  for  specific  counts  but 
have  keys  assigned  Indicating  only  the  presence  of  the  element. 

NONSPECIFIC  KEYS:  NONSPC  el 

el  is  the  element  kind  and  is  one  or  two  characters. 

EXAMPLES: 

FRAG  011001  FRAG  011001  FRAG  01A015  000001 


RINGMF  C  A  N  1  HG  1 
SKELMF  C  22  N  3  HG  2 
RNRP  6  RNRP  5, 6,6, 6 

RNRP  6,20  RNRP  6,69  RNRP  6,999999  (all  these  will  be* 
translated  the  same  as  RNRP  6,1  would  have  been  translated) 


NUMBER  4  0 
MOLFRM  C  12 
NONSPC  TE 


NUMBER  10  0 
MOLFRM  H  70 


NUMBER  0  5 
MOLFRM  BE  0 


3. 1.5.2  Program  Structure 

PACKEL  is  a  subroutine  whose  input  is  the  key  in  query  language  as 
scanned  by  the  input  scanner  and  whose  output  is  the  key  in  internal  format. 


note  that  replacing  a  number  greater  than  15^  with  1  in  the  input  will 
not  cause  any  change  in  the  translation  scheme  but  it  is  not  recommended 
since  it  is  not  as  mnemonic. 
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3'.  1 . 5 


Connection  Table  Processor 


Code  Name:  MOLE 

Programmer:  Peter  J.  Brown 

Abstract :  Program  MOLE  processes  chemical  structural  data  In  the  corm 

of  manually  generated  connection  tables.  The  data  Is  converted  into  an  inter¬ 
mediate  format  which  is  then  compressed  Into  the  CIDS  internal  connection  table 
by  program  CONVRT. 

3.1.6. 1  Program  Description 

The  structural  information  is  edited  into  three  separate  lists  which 
serve  directly  as  input  to  CONVRT.  The  format  of  these  lists  and  of  the 
connection  table  produced  by  CONVRT  is  fully  discussed  in  the  documentation  of 
that  program.  MOLE  does  little  error  checking,  and  if  i*:  does  encounter  a 
symbol  which  it  does  not  recognize,  or  which  is  out  of  place,  control  is  re¬ 
turned  to  the  calling  program  with  minus  zero  stored  in  location  MOLE. 

A  molecular  formula  is  created  from  the  structural  information  and  this 
is  output  along  with  the  connection  table.  Abnormalities,  such  as  charge, 
unusual  valence,  and  mass,  are  indicated  in  the  input  structure,  and  these  are 
reformatted  into  a  coded  abnormality  table.  The  actual  format  of  this  is 
discussed  below. 

3. 1.6.2  Program  Structure 

Program  MOLE  is  a  subroutine  which  is  called  by  EXEC30  to  convert  a 
users  structural  connection  table  to  the  internal  format.  It  is  a  subroutine 
of  the  Retrieval  System. 

The  structural  information  which  Is  the  input  to  MOLE  is  provided  in  BCD 
as  it  was  punched  on  cards  after  being  pre-processed  by  program  SCAN.  (Section 
3. 1.2.1  )  An  asterisk  may  appear  between  the  bond  and  the  connection  to  in¬ 
dicate  that  the  bond  is  in  a  ring.  Abnormalities  are  set  off  by  parentheses. 
All  blank  characters  are  ignored.  The  following  example  will  Illustrate  this: 


1 NIC-10-1*2-1*2-1*6. 2C1*1 -2*3. 3C2*2-1*4. 401*3-1*5. 5Cl*4-2*6  6C2*5-1*1. 
(V1-5.C1-+1,! 
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MOLE  must  be  given  the  core  location  of  this  data  and  the  length  of  the 
array.  These  are  to  be  provided  by  the  calling  program  in  the  address  and  de¬ 
crement  portions  of  the  accumulator,  respectively. 

MOLE  generates  the  following  block  of  data  containing  the  molecular  formula, 
connection  tabJe,  and  abnormality  table: 

Word 

1 

2 

3 

» 

X+2 

X+Y+2 

The  location  of  this  block  is  stored  in  the  accumulator,  bits  (21-35), 
with  the  length  in  bits  (3-17),  when  control  is  returned  to  the  calling  program. 

The  molecular  formula  is  stored  in  the  same  format  as  the  Hill  formula 
for  a  file  compound. 

The  Abnormality  Table  will  consist  of  a  series  of  words,  where  each  word 
contains  information  about  one  atom  which  has  an  "abnormality",  either  charge, 
mass,  valence,  or  attachments.  The  format  of  these  words  is: 

Bits  Contents 

(S,l,2)  Type  of  Abnormality: 

101=charge 
110-mass 
lll»valence 
100«attachments 
(3-17)  Atom  number 


Contents 

A“No,  of  words  preceding  the  C.T.(X+2) 

D=Total  number  of  words  (X+Y+Z+2) 

A*No.  of  words  in  the  C.T.  (Y) 

D=No.  of  words  preceding  the  Abnormality  Table 

(X+Y+2  or  0  if  no  abnormalities) 

Molecular  Formula 

(X  words) 

Connection  Table 

(Y  words) 

Abnormality  Table 

(Z  words) 


/ 
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Bite 


Contents 


(18)  1  if  negative  abnormality  value 

(e.g.,  a  negative  charge).  Other¬ 
wise  0. 

(21-35)  Value  of  abnormaxity:  mass, 

valence,  signed  charge,  or  number 
of  attachments. 

A  word  of  zeros  follows  the  last  abnormality  word.  This  is  included 
in  the  length  of  the  output  buffer. 


3.1.7  Molecular  Formula  Translator 


Code  Name;  MOPACK 
Programmer:  James  W.  Gerber 

Abstract:  This  program  translates  the  query  molecular  formula  to  Internal 

query  format.  It  checks  the  query  molecular  formula  for  syntactical  errors  and 
indicates  these  to  the  calling  program. 

3.1.7. 1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  55. 

Mopack  translates  the  molecular  formula  from  external  format  to  Internal 
format.  The  external  format  is  contained  in  EXEC30  described  in  Table  11 
and  the  internal  format  is  contained  in  MOLFRM  described  in  Section  3. 2. 2. 2. 


3. 1.7. 2  Program  Structure 

Mopack  is  a  subroutine  which  takes  as  input  the  molecular  formula  as 
scanned  by  the  query  scanning  program. 

The  input  to  MOPACK  is  accomplished  by  the  routine  external  to  MOPACK 
called  SCAN69.  This  routine  is  called  to  obtain  a  scanned  word.  Each  time 
SCAN69  is  called  it  must  return  with  the  next  scanned  word  in  the  accumulator. 
MOPACK  will  return  the  following  output: 

(a)  Accumulator  sign  positive  and  molform  in  BUFF  if  molform 
is  found  to  be  correct. 

(b)  Accumulator  sign  negative  if  a  syntactical  error  is  found. 

In  this  case  MOPACK  will  print  the  message  SYNTAX  ERROR 
before  returning. 

MOPACK  uses  external  routines  SCAN69  and  BCDBIN  as  well  as  JOBOUL.  Entry 
points  for  MOPACK  are  MOPACK  and  BUFF. 
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3.2  FILE  SEARCH 


The  programs  described  in  this  section  perform  the  actual  search  of  thos 
compound  records  which  respond  to  the  key  requirements  of  a  query.  These  pro¬ 
grams  determine  if  the  molecular  formula  or  structural  requirements  which  hs'c. 
'‘can  specified  by  the  query  are  satisfied. 


3.2.1  Search  Executives 


Code  Name:  TAPE,  TXINFO,  DISKTT 

Programmer:  Richard  Haber 

Abstract:  These  programs  are  used  to  retrieve  compounds  from  a  file 
stored  either  on  tape  or  on  disk.  With  the  aid  of  the  molecular  formula 
search  and  structure  search  program ,  the  executives  determine  which  of  the 
selected  compounds  actually  satisfy  the  requirements  of  various  queries. 
Output  programs  are  then  called  to  print  the  resulting  compounds. 

3.2. 1.1  Program  Descriptions 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  56. 

Three  slightly  different  search  executive  programs  exist: 

(a)  Program  TAPE  is  used  to  search  a  file  of  compounds 
stored  on  a  series  of  magnetic  tapes.  The  output, 
consisting  of  query  numbers  and  c<  npound  registry 
numbers,  is  sorted  by  query  number  and  printed  on 
a  line  printer. 

(b)  Program  TXINFO  is  used  to  search  a  file  of  compounds 
stored  on  a  disk  unit.  The  output,  consisting  of 
query  numbers,  compound  registry  numbers,  molecular 
formulas,  and  structural  diagrams,  is  punched  on  a 
paper  tape.  This  tape  can  then  be  read  by  a  Dura 
Mach  Chemical  typewriter  and  the  results  printed. 

(c)  Program  DISKTT  is  identical  to  TXINFO  except  that  the 
output  consists  of  query  number,  registry  numbers  and 
molforms  and  is  printed  directly  on  a  teletype. 

Structural  diagrams  are  not  part  of  this  output. 

Each  of  the  executives  begins  by  obtaining  (from  a  tape)  the  query 
disk-to-core  table  giving  the  location  of  each  query  stored  on  the  disk. 

The  executive  then  obtains  (also  from  a  tape)  the  merged  accession  list. 
This  is  a  list  of  pairs  of  words  containing  query  numbers  and  compound  ad¬ 
dresses.  This  list  has  been  sorted  by  compound  address,  and  is  used  by  the 
executive  to  determine  which  compounds  must  be  further  processed  as  possible 
retrievals  to  the  queries. 


The  accession  list  is  passed  through  twice.  On  the  first  pass,  each 
query  number  is  checked  to  determine  whether  the  query  has  been  read  into  core 
from  its  storage  location  on  disk.  Queries  which  have  not  previously  been 
entered  into  core  are  entered  at  this  time  until  the  area  set  aside  for  them 
is  filled.  When  a  query  is  entered  into  core,  its  core  location  is  stored  in 
the  query  disk-to-core  table. 


212 


Yes 


Are  there  more  pairs 
of  words  in 
merged  accession 
list  ?  i 


No 


EXIT 


Figure  56.  Macro  Flow  Chart  -  SEARCH  EXECUTIVES 
(continued) 


Figure  56.  Macro  Flow  Chart  *  SEARCH  EXECUTIVES 
(continued) 


When  the  query  array  has  been  filled,  cr  when  the  accession  list  has  been 
passed  through  to  completion,  the  executive  begins  to  pass  through  the  acces¬ 
sion  list  again.  From  each  pair  of  words,  a  compound  address  and  a  query  num¬ 
ber  are  obtained.  The  compound  address  is  used  to  position  the  storage  device 
containing  the  file  to  the  record  containing  the  compound  of  interest.  This 
record  is  read  into  core  to  be  searched  on  the  basis  of  the  further  specific 
requirements  of  the  query.  The  compound  addresses  are  in  ascending  order  on 
the  accession  list  in  order  to  insure  that  the  file  tapes  may  be  read  without 
having  to  back  up. 

The  query  number  is  used  co  obtain  the  correct  word  of  the  query  disk-to- 
core  table.  From  this  word  the  location  of  the  query  i*-.  core  is  found.  The 
compound  just  obtained  is  tested  to  determine  whether  it  satisfies  the  require¬ 
ments  of  the  query  as  described  below.  While  this  searching  proceeds,  the  next 
pair  of  words  from  the  accession  list  is  obtained  in  order  to  locate  and  read 
the  next  compound  to  be  tested. 

The  query  is  first  checked  to  determine  whether  a  molecular  formula  search 
is  desired.  If  it  is,  the  locations  of  the  correct  query  and  compound  molecular 
formulas  are  obtained  and  control  is  given  to  program  MOLFM  to  perform  the 
search.  Either  the  entire  molecular  formula  (Hill  molform)  or  addend  molecular 
formulas,  or  both,  may  be  searched.  The  addend  molecular  formulas  do  not  have 
to  be  in  the  same  order  in  the  query  and  compound  in  order  to  pass.  Only  one 
query  molform  and  one  compound  molform  are  given  to  MOLFM  at  a  time.  Thus, 
if  addend  molecular  formula  searches  are  desired,  each  query  molecular  formula 
will  be  searched  against  each  compound  addend  molecular  formula  in  order  to 
determine  if  a  match  occurs. 

A  query  may  include  constraint  equations  with  each  molecular  formula. 

These  allow  the  querist  to  use  algebraic  expressions  in  place  of  numbers  in 
the  molform.  For  example,  the  querist  may  require  the  following  molecular 
formula: 

CnH2n+2  where  n=2,3,4,5 
In  this  case,  the  constraint  equation  would  be: 

H=2C+2 

and  the  number  of  carbon  atoms  would  be  required  to  be  between  2  and  5. 

Any  constraint  equations  associated  v?ith  a  query  molform  are  tested  by 
the  executive  when  MOLFM  has  indicated  thac  the  query  and  compound  molecular 
formulas  match.  If  the  constraint  equations  are  satisfied,  the  next  query 
molecular  formula  is  considered.  If  the  constraint  equations  are  not  satis¬ 
fied,  or  MOLFM  has  indicated  that  the  query  and  compound  molforms  do  not  match, 
the  next  compound  molform  is  tested  against  the  current  query  molecular  formula. 

If  the  molecular  formula  search  has  passed  the  compound,  or  if  no  molec¬ 
ular  formula  search  is  desired,  the  query  is  tested  to  determine  if  a  struc¬ 
ture  search  is  required.  If  it  is,  the  locations  of  the  compound  structure 
and  the  correct  query  structure  are  obtained  and  control  is  given  to  program 
STRUC  (Section  2.4.8)  to  perform  the  search. 


More  than  one  structure  may  appear  in  a  query.  These  structures  must  be 
combined  together  in  a  disjunctive  normal  form  logical  expression.  More  than 
one  occurrence  of  any  structure  may  be  required  by  the  query.  In  addition,  the 
absence,  rather  than  than  the  presence  of  a  structure  may  be  desired.  All  of 
these  conditions  are  handled  in  a  straightforward  manner  by  the  executive,  which 
decides  whether  or  not  the  structure  requirements  of  the  query  have  been  satis* 
fled.  If  the  requirements  have  not  been  met,  the  executive  goes  cn  to  consider 
the  next  pair  of  words  from  the  accession  list. 

If  no  structure  search  was  desired,  or  if  the  structure  search  has  been 
successful,  the  compound  being  searched  is  considered  a  successful  retrieval 
to  the  query.  In  this  case  the  executive  either  outputs  the  compound  directly, 
or  calls  an  output  program  to  do  so. 

(a)  Program  TAPE  writes  the  query  name  and  compound  number  directly 
on  tape.  These  records  are  later  sorted  by  query  number  and 
printed  by  either  program  REGPRN  or  program  EAPRN. 

(b)  Program  TXINF0  calls  programs  MF0U,  DURPIX,  and  DIPADK  which 
format  and  output  the  query  name,  compound  number,  molecular 
formulas,  and  structural  diagrams  directly  on  punched  paper 
tape.  This  tape,  punched  in  DURA  code,  may  then  be  printed 
on  a  Dura  Mach  chemical  typewriter. 

(c)  Program  DISKTT  prints  the  query  name  and  compound  number 
directly  on  a  teletype.  DISKTT  calls  program  MF0U  which  then 
prints  the  compound  molecular  formula  on  the  teletype.  A 
punched  paper  tape  record  of  what  has  been  printed  may  also 
be  obtained. 

After  a  compound  has  been  output,  the  executive  goes  on  to  consider  the 
next  pair  of  words  from  the  accession  list.  The  accession  list  is  passed 
through  in  the  manner  described  above  until  either  the  entire  list  has  been 
passed  through,  or  a  query  which  has  not  yet  been  entered  into  core  is  en* 
countered.  In  the  latter  case,  the  executive  again  goes  through  the  two  part 
process  of  reading  queries  and  then  searching  them  against  the  compound  file 
starting  from  where  it  had  left  off  before. 

When  the  entire  accession  list  has  been  traversed  and  the  search  has  been 
completed  the  executive  returns  control  to  the  operating  system. 

3. 2. 1.2  Program  Structure 

The  executive  programs  accept  as  input  a  file  of  compounds,  a  set  of 
queries,  a  query  diek-to-core  table,  and  a  merged  accession  list.  The  query 
disk-to-core  table  is  used  to  give  the  locations  of  the  query  both  on  disk  and, 
when  applicable,  in  core.  This  table  is  read  from  a  utility  (either  tape  or 
disk)  which  was  created  by  program  READ  during  the  query  preprocessing. 

The  accession  list  is  used  by  the  executive  to  determine  which  compounds 
to  search  against  each  of  the  queries.  It  is  read  from  a  utility  (either 
tape  or  disk)  which  was  created  by  program  KIAD  and  sorted  by  query  number  dur¬ 
ing  the  query  preprocessing.  For  purposes  of  simplicity,  it  was  implied  in  the 


program  description  above  that  the  entire  accession  list  is  read  into  core  at 
one  time.  This  can  not  be  the  case  since  the  accession  list  may  be  quite  long. 
Up  to  920  words  of  the  accession  list  may  be  in  core  at  once.  More  is  read  as 
needed.  The  program,  however,  still  functions  in  the  manner  described  above. 

The  queries  are  stored  on  a  disk  utility  by  program  EXEC30.  Each  query 
starts  on  a  new  track.  The  internal  format  of  a  query  is  shown  in  Table  II. 

The  compound  file  is  contained  either  on  a  series  of  tapes  pr  on  a  disk 
utility.  Due  to  its  limited  size,  only  approximately  40,000  compounds  may  be 
stored  on  the  disk.  These  compounds  are  loaded  onto  the  disk  by  a  utility  pro¬ 
gram  prior  to  the  running  of  the  search  system. 

The  compound  file  may  be  unlimited  in  size  when  It  Is  contained  on  a 
series  of  tapes.  The  executive  programs  automatically  switch  from  one  tape  to 
another.  Mounting  messages  are  printed  on  the  on-line  typewriter  giving  the 
operator  the  needed  information  as  to  which  tapes  to  mount  on  which  tape  drives. 
The  executive  requires  the  mounting  of  only  those  tapes  which  are  actually  to 
be  searched.  Thus,  in  a  particular  run  of  the  retrieval  system,  some  of  the 
file  tapes  may  not  even  have  to  he  mounted. 

Program  TAPE  will  print  an  error  message  for  any  of  the  following  three 
conditions : 

(1)  A  tape  address  is  encountered  which  is  smaller  than  a  pre¬ 
vious  address.  This  would  cause  the  tape  being  searched  to 
be  rewound. 

(2)  A  tape  address  is  encountered  which  Js  larger  than  the 
largest  possible  tape  address  for  the  file 

(3)  An  error  occurred  while  skipping  to  the  next  compound 
to  ue  searched. 

Program  TXINI0  and  DISKTT  will  print  an  error  message  tor  either  of 
the  following  two  conditions: 

(1)  The  occurrence  of  an  error  while  skipping  to  the  next  compound 
to  be  searched. 

(2)  The  occurrence  of  an  error  while  reading  into  core  a  compound 
to  be  searched. 

When  any  of  the  above  errors  occur,  the  search  run  is  terminated  after 
the  error  message  has  been  printed. 
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Molecular  Formula  Search 


Code  Name:  MOLFM 

Programmer:  Richard  Haber 

Abstract :  Program  MOLFM  is  used  to  determine  whether  the  molecular  for¬ 

mula  of  a  particular  file  compound  satisfies  the  requirements  specified  in  a 
particular  query. 

3. 2.2.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Frgure  57. 

Program  MOLFM  is  used  to  determine  whether  the  molecular  formula  of  a 
particular  file  compound  satisfies  certain  requirements  specified  in  a  auerv, 
MOLFM  is  called  by  the  search  executive  when  a  molecular  formula  search  is 
asked  for  in  the  query. 

Both  the  compound  and  the  query  may  contain  addends.  Thus  they  mav  eacn 
contain  more  than  one  molecular  formula.  MOLFM  is  used  tc  test  one  specific 
compound  molecular  formula  against  the  requirements  of  one  specific  auerv  mole¬ 
cular  formula.  The  detailed  formats  of  the  molecular  formulas  are  shown  ir. 
Section  3. 2. 2. 2, 

When  MOLFM  receives  control,  it  first  checks  to  make  certain  that  the 
query  actually  contains  a  molecular  formula.  If  no  molecular  formula  is  present 
In  the  query,  control  is  returned  to  the  executive  program  and  the  search  is 
considered  to  have  failed. 

If  a  molecular  formula  is  present,  the  actual  molecular  formula  search  is 
then  performed.  All  information  specified  in  the  query  must  be  contained  in  the 
compound  for  the  compound  to  be  considered  as  satisfying  the  querv  requirements. 

As  shown  in  Figure  13the  number  of  carbon,  hydrogen,  nitrogen,  and  oxygen 
atoms  are  scored  in  the  first  word  of  the  molecular  formula  of  the  compound. 

All  other  elements  appear  alphabetically  in  the  following  words.  In  the  querv 
the  elements  are  arranged  such  that  carbon  is  first,  hydrogen  is  second,  and 
the  remaining  elements  are  in  alphabetical  order. 

Various  types  of  searches  are  possible  for  each  element  in  the  query: 

a)  Fxait  search.  A  test  is  made  tv  determine  whether  the  com¬ 
pound  contains  exactly  the  same  number  of  atoms  of  the 
particular  element  as  appear  in  the  querv.  Any  number  of 
atoms  between  0  and  63  may  be  requested  for  anv  element  in 
an  exact  search.  For  oxygen,  and  nitrogen  anv  number  of 
atoms  up  to  127  may  be  requested,  while  the  upper  limit 
for  carbon  is  255  and  for  hydrogen  is  511. 

b"1  Range  search.  The  compound  is  checked  for  the  presence 
of  the  element  in  question.  When  the  element  has  been 
found,  its  atom  count  is  compared  with  the  limits  appearing 
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Figure  57.  Macro  Flow  Chart  -  MOLFM 
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Figure  57.  Macro  Flow  Chart  -  MOLFM 
(continued) 
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Figure  57.  Macro  Flow  Chart  -  MOLFM 
(continued) 


in  the  query.  For  the  compound  to  be  considered 
further,  the  atom  count  of  the  clement  in  the  compound  must 
be  greater  than  or  equal  to  the  lower  limit  and  less  than  or 
equal  to  the  upper  limit.  The  same  restrictions  on  the  max¬ 
imum  number  of  atoms  requested  for  ea»h  element  apply  here 
also. 

c)  A  search  to  check  for  the  presence  of  the  given  element 
within  the  compound.  In  this  case  the  number  of  atoms  of 
the  element  does  not  matter. 

The  symbol  X  Is  used  as  an  element  symbol  in  queries.  It  represents  the  sum 
of  all  halogens  (Br,  Cl,  F,  and  I)  and  is  treated  like  any  other  element  symbol 
by  MOLFM.  For  example,  both  C2F5  and  C2CI2F4  would  satisfy  a  query  requiring 
six  halogens  (Xg).  C2CI2FA  would  also  satisfy  a  query  requiring  four  fluorine 
atoms  and  six  halogens  (F^g)  since  the  total  halogen  requirements  is  independ¬ 
ent  of  the  requirements  placed  on  specific  halogen  elements. 

MOLFM  also  has  the  capability  of  performing  a  restricted  search.  For  a 
compound  to  pass  a  restricted  search,  each  of  its  elements  must  have  passed  one 
of  the  three  types  of  searches  already  mentioned.  In  other  words,  in  order  to 
pass,  a  compound  may  contain  no  elements  other  than  those  listed  in  the  query. 

3. 2. 2. 2  Program  Structure 

MOLFM  is  a  subroutine  which  returns  with  the  sign  of  the  accumulator  set 
to  plus  if  the  search  has  passed  and  to  minus  if  the  search  has  failed.  The 
format  of  the  molecular  formula  in  the  query  is  as  follows: 


Word  of  Formula 

Bits 

Contents 

First  Word 

3-17 

Number  of  words  in  molecular  formula 

19  , 

t 

=  1  if  restricted  search 
^ -  0  otherwise 

Additional 

0-11 

Element  (BCD) 

Element  Words 

12-20 

« 

1 * 

|  Maximum  number  of  atoms  if  range  search; 

Number  of  atoms  if  exact  search; 

[  =  0  otherwise 

21-28  | 

f  Minimum  number  of  atoms  if  range  search 
[  =0  otherwise 

34  J 

[  ■  1  if  search  of  atoms  is  a  range  search 
=  0  otherwise 

35  i 

f  =  1  if  search  of  atom  is  an  exact  match 
j  search 

1=0  otherwise 

The  format  of  the  molecular  formula  in  the  compound  is  given  in  Section 
2.2.4. 
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*•-  PRESENTATION  of  RES PONS L" 


3.3.1  Registry  Number  and  Descriptor  Print  Program 


Code  Name:  EAPRN 

Programmer :  Richard  Haber 

Abstract;  Program  EAPRN  is  used  to  format  and  print  compound  registry 
numbers  and  descriptors  obtained  as  retrievals  to  queries  searched  against 
the  tape  file. 

3.3. 1.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  ,8. 

Program  EAPRN  is  used  to  format  and  print  registry  numbers  and  descriptors 
of  compounds  retrieved  by  the  batch  search  system.  This  information  is  con¬ 
tained  on  a  system  utility  as  a  result  of  TAPE  (Section  32  i . I > 
and  has  been  sorted  by  query  number. 

EAPRN  obtains  the  retrievals  (query  number,  registry  number  and  auy 
associated  descriptors)  one  at  a  time  and  prints  the  query  number.  One 
twelve  character  registry  number  is  then  printed  on  each  line.  It  may  be 
followed  by  0  -  10  descriptors,  each  of  which  is  preceded  by  the  letters  EA 
(standing  for  Edgewood  Arsenal). 

The  number  of  retrievals  obtained  for  each  query  is  printed  below  the 
list  of  registry  numbers  for  the  query.  A  new  page  is  used  to  list  the 
answers  of  each  new  query. 

3.3. 1.2  Program  Structure 

EAPRN  is  an  autonomous  program  which  accepts  14  -  word  IOBS  type  j  re¬ 
cords  from  system  utility  unit  14.  The  first  word  of  each  record  is  considered 
to  contain  a  query  number,  the  next  two,  a  compound  registry  number,  and  the 
remainder,  any  descriptors  which  may  be  present 

EAPRN  is  terminated  when  an  end-of  file  is  encountered  on  the  utility 


•23 


Figure  58.  Macro  Flow  Chart  -  EAPRN 
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3.3.2  Structural  Formula  Reconstruction  For  Paper  Tape  Output 


Code  Name;  DURP1X 
Programmer :  Helen  Hill 


Abe tract:  Reconstructs  picture  for  output  from  7040  through 

PDP-8  to  punch  Dura  Mach  paper  tape. 

3. 3. 2.1  Program  Description 

A  macro  flow  chart  describing  this  program  is  presented  in  Figure  59. 

DURPIX  takes  the  Scrub  list  and  decodes  each  word  one  at  a  time,  fill¬ 
ing  a  buffer  with  the  proper  Dura  Mach  characters  to  punch  paper  tape 
efficiently.  This  tape  can  then  be  used  to  type  the  picture  on  the  Dura  Mach. 

3.3. 2.2  Program  Structure 

The  program  occupies  894  core  locations  and  contains  a  127  location 
table  which  translates  compact  Dura  Mach  characters  to  actual  Dura  Mach 
characters.  Subroutine  PACK  is  used  to  pack  Dura  Mach  characters  for  output- 
Transmission  of  the  information  to  the  PDP-8  for  the  production  of  paper 
tape  is  accomplished  by  a  modified  version  of  Joboul. 

DURPIX  is  a  subroutine  which  takes  as  input  the  SCRUB  list  and  outputs 
a  buffer  of  packed  Dura  Mach  characters. 


Calculate  X 
&  Y  <Joordln 
ates 
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3.3.3 


Structural  Formula  Reconstruction 


Code  Name;  PIX  &  LINPIX 
Programmer:  Helen  Hill 

Abatract:  PIX  reconstructs  the  picture  from  the  stored  structural 

formula  image  and  calls  TRANSL  and  WRITE  to  output  on  the  1401  printer.  LINPIX 
reconstructs  the  picture  line  by  line  for  output  on  the  chemical  Line  Printer. 

3. 3. 3.1  Program  Description 

A  macro  flow  chart  describing  PIX  is  presented  in  Figure  25. 

PIX  and  LINPIX  take  as  input  the  stored  structural  formula  image  (SCRUB) 
containing  each  character  in  the  structure  and  its  relative  location  in  a  matrix, 
and  the  x  and  y  size  of  the  matrix.  PIX  reconstructs  the  picture  in  the  matrix 
in  compact  internal  code  (six  bit  code  with  an  added  case  bit).  LINPIX  does  the 
same  thing  one  line  at  a  time. 


30 -*5-80)3 

6  Bit 

_  Dura  Code 
^'"‘Case  Bit 


3. 3. 3. 2  Program  Structure 

PIX  is  a  subroutine  which  utilizes  a  10000  location  matrix  In  which 
to  reconstruct  the  picture.  LINPIX  uses  a  100  location  buffer  to  reconstruct 
a  single  line  of  the  picture  . 

PIX  &  LINPIX  make  use  of  the  following  inouf 

SCRUB  -  701  locations 
DELX  and  DELY 


UNPTAB  -  table  of  underlines 


Figure  60.  Macro  Flow  Chart 


3.3.  A  Dura  Mach  Output  Package  and  Teletype  l>utpi.t  Package 


Code  Name:  DURADK,  MFOU,  LEADER 

Programmer :  James  Gerber 

Abstract :  DURADK  contains  routines  to  translate  and  format  information 
and  to  punch  this  on  the  teletype  to  produce  a  t.  pe  for  the  Dura  Mach  type¬ 
writer.  Leader  is  produced  by  routine  LEADER  in  another  deck  MFOU  will  format 
and  print  the  Molform  on  the  teletype  or  line  printer. 

3. 3. 4.1  Program  Description 

DURADK  consists  of  the  following  routines  whi.  n  are  rallec  bv  TXINFO  to 
produce  Dura  Mach  output : 

(1)  MFDURA  will  punch  the  Molform  in  Dura  code  to  be  tvoed 
out  in  readable  format.  It  operates  similarly  to  MFOU. 

The  location  of  the  first  word  of  the  Molform  is  found 
in  the  accumulator  when  the  routine  is  called. 

(2)  OUERNO  will  punch  out  the  query  number  and  the  header 
auery  number.  The  location  of  the  query  number  is  in 
the  accumulator  when  OUF.RNO  is  called. 

(3)  REGNO  will  print  the  registry  number  with  leading  zeros 
eliminated.  The  registry  number  is  preceded  by  the 
character  RN.  The  registry  number  is  preceded  bv  the 
Dtira  "print  on"  code.  The  accumulator  contains  the  loca¬ 
tion  of  the  first  word  of  the  two  word  registry  number 
when  REGNO  is  called. 

(4)  QNASCI  will  punch  the  Dura  print  off  code,  the  querv  number 
in  ASCII  code.  The  accumulator  should  contain  the  loca¬ 
tion  of  the  query  number  when  it  is  called. 

DURADK  uses  routine  BINBCD  and  JOBOUL.  If  used  without  the  CIDE  JOBOUL 
package,  the  resulting  output  on  the  1403  line  printer  will  be  the  garbled 
version  of  the  Dura  output  since  the  output  is  formatted  three  characters  to  a 
7040  word.  Thus  each  Dura  character  will  print  us  two  printer  characters  which 
will  have  no  relation  to  the  Dura  character. 

These  routines  expect  the  Dura  Mach  to  be  at  the  beginning  of  a  line  and 
always  return  with  the  Dura  Mach  at  the  beginning  of  a  line.  The  Dura  Mach 
must  be  in  lower  case  at  both  the  start  and  end.  Any  routines  used  with  this 
pa.'kag0  ’’’us t  therefore  return  with  the  Dura  Mach  in  lower  case  and  at  the 
beginning  of  a  line. 

3.3. 4.2  Program  Structure 

MFOU  consists  of  two  routines: 

(1)  MFOU  is  called  with  the  location  of  the  first  word  of  the 
item  Molform  in  the  accumulator.  MFOU  will  format  and 
print  the  Molform  (both  hill  and  aUdend  Molform  if  present) 
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usinfc  JOBOUL.  If  the  teletvpe  version  of  JOBOUL  is  used, 
this  will  print  out  on  the  on-line (PDP) teletypes. 

(2)  LEADER  will  produce  about  12"  of  leader  (octal  code  200) 
on  the  teletype. 

MFOU  uses  routines  BINBCD  and  JOBOUL. 


''1? 
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APPENDIX  A 


SYSTEM  FLOWCHART  WITH  PROGRAM  NAMES 


A  macro  flowc!  irt  of  the  complete  CIDS  system  is  presented  on  the 
following  pages.  The  code  names  of  the  programs  required  to  perform  each 
phase  of  processing  are  included.  The  file  construction  subsystem  is 
presented  first.  The  initial  processing  differs  for  chemical  typewriter 
input  and  CAS  input,  thus  these  charts  are  shown  separately.  A  single 
chart  describes  the  remainder  of  the  processing  which  is  the  same  for  both 
types  of  input.  The  search  subsystem  is  divided  into  two  charts  -  the 
batch  search  system  and  the  real  time  search  system.  The  designation  CF 
on  a  tape  symbol  means  that  the  compound  file  is  in  CIDS  record  format. 
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File  Construction  (Chemical  Typewriter  Input) 
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File  Construction!  (CAS  Input) 
237 


Pile  Construction  (All  Input)  (Continued) 
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Lm-  Search  System 


Search  System 


APPENDIX  B 


ADDMF 


APOLGY 


BONDCT 


CASFMT 


CLEANM 


COMPR 


PROGRAM  ABSTRACTS 


Addition  of  Molecular  Formula  (2.1.3) 

ADDMF  reads  a  tape  of  compound  connection  tables  which  have  been  translated 
from  CAS  to  CIDS  format  and  are  ordered  by  CAS  Registry  Number. 

Molecular  formula  data  from  the  CAS  Bibliography  File  is  added  to  this 
tape  and  the  compound  records  are  rewritten  in  CIDS  record  format. 

Error  Message  Program  (2,2.9) 

APOLGY  is  transferred  to  from  ORGNZR,  EXCESS,  REGRUP,  MOLFRM,  and 
MONIKP  to  write  error  messages  using  Fortran  read-write  routineo. 

Bond  Count  (2,4.10) 

Program  BONDCT  assigns  a  specific  limited  subclass  of  the  aliphatic 
keys.  It  assigns  the  acyclic  nucleus  keys  and  two  types  of  hydro¬ 
carbon  radical  keys. 

CAS  Structure  Conversion  (2.1.1) 

CASFMT  reads  the  CAS  Structure  Master  File  and  translates  the  infor¬ 
mation  to  tin-*  CIDS  format.  The  output  of  CASFMT  is  a  tape  containing 
the  registry  number,  connection  table,  and  abnormality  table  (if 
present)  for  each  CAS  compound  converted. 

Reduction  of  Che  Matrix  to  Points  and  Lines  (2.2,11) 

CLEANM  is  given  a  pointer  to  a  specific  node  by  SETUP.  It  then 
"cleans"  the  eight  locations  around  that  node  in  the  matrix  for  use 
in  MAKECT,  All  charge  signs  and  mass  numbers  are  removed,  double 
letter  elements  are  replaced  by  a  one  word  symbol,  and  special  cases 
such  as  Ph  and  -(C)  -  are  treated.  An  abnormality  table  of  abnormal 

masses,  charges  and  valences  is  created.  A  connection  table  number 
is  assigned  to  each  atom  and  the  word  in  SCRUB  corresponding  to  a  node 
which  has  been  processed  by  CLEANM  is  made  minus.  Control  is  returned 
to  SETUP  after  operation  on  the  given  node  is  complete. 

Ring  Compression  (2.4.6) 

Program  COMPR  (RING3)  removes  all  atoms  in  the  connection  table  which 
have  exactly  two  attachments  and  removes  side  chains  from  the  struc¬ 
ture,  in  order  that  the  ring  descriptors  may  be  found.  The  program 
also  contains  a  subroutine  which  removes  a  prescribed  path  from  the 
structure.  „ 


CONVRT  -  Structure  Conversion  and  Compression  (2.1.2) 

This  program  converts  a  structure  to  a  format  suitable  for  storage 
and  searching.  The  structure  is  compressed  to  facilitate  the  atom- 
by-atom  search  by  removing  carbon  atoms  with  exactly  two  direct 
attachments,  leaving  only  a  description  of  the  bonds  in  the  chain. 

The  program  will  also  format  structures  which  are  query  fragments, 
in  which  case  the  resulting  connection  table  has  redundancy  removed 
and  the  atoms  are  ordered  to  speed  searching.  In  addition,  the 
various  types  of  free,  or  hanging,  bonds  are  formatted. 

DISKTT  ~  See  TAPE. 

DURADK  -  Dura  Mach  Output  Package  and  Teletype  Output  Package  (3.3.4) 

DUKADK  contains  routines  to  translate  and  format  information  and  to 
punch  this  on  the  teletype  to  produce  a  tape  for  the  Dura  Mach  type¬ 
writer, 

DURPIX  -  Structural  Formula  Reconstruction  for  Pape’’  Tape  Output  (3.3.2) 

Reconstructs  picture  for  output  from  7040  through  PDP-8  to  punch 
Dura  Mach  paper  tape. 

EAPRN  -  Registry  Number  and  Descriptor  Print  Program  (3.3.1) 

Program  EaPRN  is  used  to  format  and  print  compound  registry  numbers 
and  descriptors  obtained  a?  retrievals  to  queries  searched  against  the 
tape  file. 

EXCESS  -  Structure  oi  Non-Bracketed  Information  (2.2.8) 

EXCESS  formats  all  structural  characters  appearing  outside  of 
brackets  in  a  bracketed  structure. 

F.XEC30  -  Query  Preprocessor  (3.1.2) 

The  Query  Preprocessor  is  a  set  of  programs  that  scans  the  source 
text  of  a  query  presented  by  the  user  and  translates  it  into  the 
internal  coding  of  the  retrieval  system.  Queries  are  checked  for 
syntactical  errors  and  edited.  In  this  role,  the  Query  Preprocessor 
communicates  with  the  other  programs  of  the  CIDS  system  to  allow 
system  to  adjust  to  particular  u"er  requirements. 

HCRCT  -  Nonspecific  Hydrocarbon  Radical  Key  Assignment  (2,4.9) 

Program  HCRCT  assigns  to  a  structure  a  subclass  of  the  set  o-f 
aliphatic  hydrocarbon  radical  keys. 
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HLDPRC  -  Hold  Tape  Processor  (2.3.2) 


HLDPRC  is  a  program  of  the  Registry  System  which  processes  compounds 
otj  the  Hold  tape  produced  by  program  STARTA,  according  to  an  action 
code  punched  on  each  card  of  the  TID  card  deck  produced  by  STARTA. 

The  action  codes  are  the  results  of  a  chemist's  decision  to  register, 
ignore,  or  update  the  compound  record  for  each  compound  on  the  Hold 
tape . 

INDEX  -  Index  Creation  (2.5.4) 

The  key- to- compound  locater  table,  used  by  the  CIDS  Retrieval 
System,  is  created  by  program  INDEX  from  the  inverted  key  list  by 
program  MERGE. 

INPUTD  -  Cura  Mach  Input  Program  (2.2.2) 

>  This  program  accepts  magnetic  tape  images  of  the  paper  tape  chemical 

records  typed  by  the  Dura  Mach  chemical  typewriter  and  reconstructs 
the  chemical  record  in  a  2-dimensional  array  called  MATRIX. 

INPUT  -  Query  Input  Executive  (3.1.1) 

Program  INPUT  is  used  to  keep  track  of  the  number  of  queries  correctly 
entered  in  the  system.  It  also  stores  the  disk  address  of  each  query 
in  the  query  disk-core  table. 

KEYSRT  -  Key-Address  Sort  (2.5.2) 

KEYSRT  sorts  the  Key-Address  tape  which  is  output  from  program  NUFILE. 
The  key-address  pairs  are  sorted  in  ascending  order  according  to  key 
number.  It  maintains  the  ascending  order  of  addresses  as  they  are 
]  produced  by  NUFILE, 

[KIAD  -  Key- Express  ion  to  Accession  List  Processor  (3,1.4) 

I 

I 

i  KIAD  accepts  the  boolean  key  expression  as  input  and  produces  the 

|  list  of  all  compound  record  addresses  that  pass  the  key  expression 

j.  (the  accession  list.) 

I 

[LEADER  -  Dura  Mach  Output  Package  and  Teletype  Output  Package  (3.3.4) 

Paper  tape  leader  is  produced  by  routine  LEADER, 

Structural  Formula  Reconstruction  (3.3.3) 

LINPIX  reconstructs  the  picture  from  the  stored  structural  formula 
image  line  by  line  for  output  on  the  chemical  line  printer. 
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MAKECT  "  Generation  of  the  Connection  Table  (2.2.12) 


MERGE 


MFOU 


MFSRN 


MOLE 


MOLEF 


MOLFM 


MOLFRM 


MONIKR 


MAKECT  assumes  a  MATRIX  of  nodes  and  connecting  lines.  A  list  is 
generated  for  each  node  indicating  the  type  of  node  (element  type), 
all  associated  points,  and  the  connecting  line  types  (bonds).  This 
program  also  indicates  if  the  atom  is  to  be  multiplied  in  order  to 
be  correctly  compared  with  the  molecular  formula  during  chemical 
verification. 

Key-Address  Merge  (2.5.3) 

MERGE  combines  the  Sorted  Key-Address  tape  (see  NUFILE  and  KEYSRT) 
with  the  Old  Merged  Key-Address  tape  containing*  all  the  keys  in  the 
file  (prior  to  the  present  run)  and  the  addresses  of  their  occurrence 
to  produce  a  New  Merged  Key-Address  tape. 

Dura  Mach  Output  Package  and  Teletype  Output  Package  (3.3.4) 

MFOU  formats  and  prints  the  Molform  on  the  teletype  or  line  printer. 
Molecular  Formula  Key  Assignment  (2.4.11) 

Molecular  Formula  keys  are  assigned  to  a  compound  based  on  the  Hill 
molecular  formula.  One  key  is  assigned  to  identify  each  element 
present. 

Connection  Table  Processor  (3.1.6) 

Program  MOLE  processes  chemical  structural  data  in  the  form  of 
manually  generated  connection  tables.  The  data  is  converted  into 
an  intermediate  format  which  is  then  compressed  into  the  CIDS  internal 
connection  table  by  program  CONVRT. 

Molecular  Formula  Extraction  Program  (2,1,4) 

Subroutine  MOLEF  consists  of  a  package  of  programs  that  locate  and 
extract  the  file  record  corresponding  to  a  given  registry  number  fron 
the  CAS  Bibliography  tapes.  Summation  and  addend  molecular  formulas 
are  computed  and  stored  in  CIDS  format. 

Molecular  Formula  Search  (3.2.2) 

Program  MOLFM  is  used  to  determine  whether  the  molecular  formula  of 
a  particular  file  compound  satisfies  the  requirements  specified  in 
a  particular  query. 

Molecular  Formula  Format  Program  (2.2.4) 

This  program  formats  the  molecular  formulas  in  the  typed  chemical  tcc- 
Nomenclature  and  Reference  Field  Formatting  Program  (2.2.5) 

MONIKR  formats  the  nomenclature  and  any  other  information  typed  with  < 
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MOPACK  -  Molecular  Formula  Translator  (3.1.7) 


This  program  translates  the  query  molecular  formula  to  internal  query 
format.  It  checks  the  query  molecular  formula  for  syntactical  errors 
and  indicates  these  to  the  calling  program. 

MUSTRP  -  Alternate  Path  Search  (2.4.5) 

MUSTRP  (RING2)  is  given  a  connection  table  path  between  two  nodes  in 
a  ring  and  searches  for  any  alternate  paths  between  these  two  nodes. 

If  more  than  one  alternate  path  is  found,  the  "best"  of  these  is 
chosen. 

NFCF  -  Expansion  of  the  Connection  Table  (2.2.15) 

This  program  expands  the  connection  table  from  the  internal  format  to 
the  format  acceptable  oy  program  CONVRT  and  will  print  the  connection 
table  and  abnormality  table  if  a  switch  is  set. 

NUFILE  -  Search  File  Creation  or  Update  (2.5.1) 

NUFILE  creates  a  search  file  of  compounds  by  simply  assigning  each 
compound  to  an  area  in  the  file  as  it  is  input  to  NUFILE.  An 
existing  file  may  be  updated  by  the  same  process.  NUFILE  simultaneous¬ 
ly  creates  a  tape  of  key  and  file  address  pairs  which  will  be  used  by 
programs  MERGE  and  INDEX  to  create  an  index  to  the  complete  compound 
file . 

ORGNZR  -  Field  Recognizer  and  Format  Program  (2.2.3) 

ORGNZR  takes  the  reconstructed  matrix  (in  Mergenthaler  code  with  a 
case  bit  added)  and  recognizes  each  field  in  the  chemical  record. 

It  formats  the  temporary  identification  number,  the  security 
classification,  the  molecular  formula  (both  Hill  and  addend  molform 
if  the  latter  is  present),  the  structural  formula  image,  the  stereo 
information  and  the  nomenclature. 

PACKED  »  Key  Packing  Program  (3,1,5) 

The  program  translates  the  individual  key  names  from  query  language 
format  to  internal  format. 

PHASE5  -  Calling  Program  for  Chemical  Verification  (2.2.13) 

This  is  the  call  program  for  VERIFY,  If  a  compound  is  found  to  be 
correct  by  verification,  this  program  transfers  to  NFCF.  Otherwise, 
an  error  exit  is  taken  and  control  is  transferred  to  REJECT. 

PIX  -  Structural  Formula  Reconstruction  (3,3.3) 

PEX  reconstructs  the  picture  from  the  stored  structural  formula  image 
and  calls  TRANSL  and  WRITE  to  output  on  the  1401  printer. 


PSCKYT  -  Nonspecific  Phosphorus  Functional  Group  (2,4.12) 


Subroutine  PSCKYT  assigns  keys  to  compounds  which  contain  certain  types 
of  phosphorus  functional  groups  which  were  not  among  those  selected 
as  Specific  Functional  Group  keys. 

PUNCH  -  Descriptor  Punch  Program  (2.2.6) 

This  program  finds  the  EA,  T,  and  TL  descriptors,  if  there  are  any. 

It  then  gets  the  corresponding  TID  number  of  the  compound  and  punches 
it  on  a  card  followed  by  the  EA,  T,  or  T1  descriptor  number. 

READ  -  Query  Reader  (3.1.3) 

Program  READ  is  used  to  read  queries  from  an  input  device.  It  can  be 
used  to  read  either  punched  cards  or  punched  paper  tape  produced  by  a 
teletype . 

REGRUP  -  SFI  Reordering  Program  (2.2.7) 

Program  reorders  the  SFI  when  brackets  or  a  monovalent  salt  are 
present  so  that  all  characters  within  a  given  set  of  coordinates  appear 
compactly  in  the  SFI. 

REGUD  -  Registry  Print  Tape  Update  (2.3.3) 

REGUD  updates  the  Print  tape  by  adding  new  records  for  a  group  of 
newly  registered  compounds. 

REJECT  -  Rejection  of  Incorrect  Records  (2.2.17) 

This  program  is  transferred  to  from  various  portions  of  the  CHEMTYPE 
system.  A  message  is  printed  out  and  the  program  transfers  to  AEND. 

RING1  -  Ring  Analysis  Executive  (2.4.4) 

The  general  function  of  RING1  is  to  find  the  smallest  set  of  smallest 
cycles  in  a  compound  patterned  after  the  rules  of  the  Ring  Index. 

These  cycles  are  determined  and  the  generic  cyclic  nuclei  keys  are 
assigned  to  the  compound. 

RING2  -  See  MUSTRP. 

RING3  -  See  COMPR. 

RING4  -See  TABLE. 

RUD  II  -  Registry  Print  Tape  Update  II  (2.3.4) 

RUD  II  updates  the  Print  tape  by  adding  new  records  for  a  group  of 
newly  registered  compounds  and  updating  records  corresponding  to 
previously  registered  compounds. 
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SCNCAS  -  Key  Assignment  Executive  (2.4.1) 

SCNCAS  is  the  executive  for  the  Key  Assignment  programs.  For  each 
compound  on  the  input  tape,  the  sub-executive  program  is  called  which  in 
turn  calls  the  appropriate  screening  subroutines.  SCNCAS  writes  the 
compound  record  on  tape  in  the  same  format  with  the  newly  assigned 
keys  added  to  the  record. 

SCREEN  -  Key  Assignment  Sub-Executive  (2.4.2) 

Program  SCREEN  (other  versions  SCRNCR,  SCRNDR)  is  a  subroutine  of 
program  SCNCAS  and  acts  as  an  intermediary  between  it  and  the  various 
key  assignment  subroutines.  SCREEN,  the  version  used  when  hydro¬ 
carbon  radical  and  functional  group  fragment  keys  are  being  assigned, 
selects  the  particular  screen  fragments  which  must  be  applied  to  tne 
compound  being  screened. 

SCRNCR  -  See  SCREEN. 

SCRNDR  -  See  SCREEN. 

SETUP  -  Linear  String  Classification  (2.2.10) 

This  program  finds  a  capital  letter  in  the  SCRUB  list  and  then  scans 
to  the  left  and  right  of  this  letter  in  the  MATRIX  assigning  a  type 
code  to  the  linear  string.  It  then  transfers  to  CLEANM  for  processing. 

SLOAD  -  Loading  of  Structural  fragment  Screens  (2.4.3) 

SLOAD  prepares  structural  fragment  data  for  use  by  the  screen  assignment 
program. 

STARTA  -  Master  Registry  Program  (2.3.1) 

STARTA  determines  which  of  a  group  of  potential  new  compounds  are 
different  from  those  already  registered  in  the  master  file.  These 
compounds  are  registered,  positive  matches  are  discarded,  and  question¬ 
able  matches  are  printed  for  further  examination  by  a  chemist. 

STRUC  -  Atom-by-Atom  Search  (2.4.8) 

The  purpose  of  the  atom-by-atom  search  program  is  to  determine  if  a 
one-to-one  correspondence  exists  between  the  nodes  (atoms)  and  con¬ 
nections  in  a  given  query  structure,  and  some  set  of  nodes  and  connections 
in  a  given  file  compound  structure. 

TABLE  -  Connection  Table  Expansion  (2.4.7) 

TABLE  (RING4)  expands  a  connection  table  given  in  the  compressed  format 
(see  program  CONVRT)  to  a  form  suitable  for  application  of  the  ring 
analysis  programs. 
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TAPE  -  Search  Executives  (3.2.1) 


Programs  TAPE,TXINFO  and  DISKTT  are  used  to  retrieve  compounds  from  a 
file  stored  either  on  tape  or  on  disk.  With  the  aid  of  the  molecular 
formula  search  and  structure  search  programs,  the  executives  determine 
which  of  the  selected  compounds  actually  satisfy  the  requirements  of 
various  queries.  Output  programs  are  then  called  to  print  the 
resulting  compounds. 

TAPWRM  -  Mergenthaler  Input  Program  (2.2.1) 

TAPWRM  reads  typewriter  characters  in  Mergenthaler  Code  from  a  magnetic 
tape.  It  interprets  these  codes  and  constructs  a  2-dimensional  array 
containing  an  image  of  the  typed  chemical  record. 

TICKER  -  Output  of  Chemical  Record  (2.2.16) 

TICKER  writes  an  output  tape  containing  the  TID,  classification  and 
stereo  information,  molform,  nomenclature  and  references,  structural 
formula  image,  connection  table,  and  abnormality  table. 

TXINFO  -  See  TAPE. 

UPTAP  -  CHEMTYPE  to  CIDS  Format  Conversion  (2.2.18) 

UPTAP  reformats  the  output  of  the  CHEMTYPE  system  into  the  CIDS 
record  format  and  merges  into  the  record  descriptors  which  were 
introduced  through  punched  cards. 

VERIFY  -  Chemical  Verification  (2.2.14) 

VERIFY  checks  the  chemical  consistency  of  the  structural  formula, 
molecular  formula,  and  connection  table,  and  verifies  the  valence  of 
each  element  in  the  connection  table  and  in  the  abnormality  table. 


APPENDIX  C 


CAS  FORMATS  INPUT  TO  C IDS 


This  appendix  is  a  reproduction  of  selected  portions  of  the  Chem¬ 
ical  Abstracts  Service  Registry  System  Manual,  revised  February  1966. 

It  describes  the  formats  of  the  CAS  Structure  Master  File  which  is  the 
input  to  CIDS  program  CASFMT  (Section  2.1.1)  and  the  CAS  Bibliography 
File  which  is  the  input  to  CIDS  program  MOLEF  (Section  2.1.4). 
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Structure  File 


The  preceding  page  is  a  photographic  re-production  of  a  tape  dump 
of  part  of  the  Structure  File.  The  Structure  File  contains  all 
the  information  in  the  unique/compact  connection  table  generated 
for  a  compound  by  the  Registry  programs.  On  the  left  of  the  sheet, 
the  notations  "BL"  and  "RC"  appear.  These  notations  which  are  not 
on  the  tape,  are  generated  by  the  program  which  prints  the  tape  dump. 
"BL"  stands  for  block  length  and  is  followed  by  a  count  of  the  char¬ 
acters  in  that  physical  tape  record.  "RC"  stands  for  record  count 
and  is  followed  by  a  count  of  the  characters  appearing  in  the 
associated  logical  record.  For  the  structure  file  a  logical 
record  may  be  a  "FI",  "F2",  "F3",  or  "Fh"  record  (see  descriptions). 
Several  logical  records  may  appear  within  one  physical  tape  record. 


Description 


1.  Block  count  (IBM  standard).  This  four-position  count  appears 

at  the  beginning  of  each  physical  tape  record  (block)  and 

..  indicates  the  number  of  characters  in  the  block.  As  per  IBM 

standard,  the  digits  0-9  in  the  units  position  print  as  •>*,  A, 

B,  C,  D,  E,  F,  G,  H,  and  I,  respectively. 

2c  Record  count  (IBM  standard  required  for  variable  length,  blocked 
records).  This  four-position  count  gives  the  nymber  of 
characters  in  the  following  logical  tape  record.  The  record 
count  is  the  first  field  of  the  logical  record. 

3.  Record  Type.  This  two-position  field  defines  the  contents  of 
the  records. 

^a.  (3  is  "FI")  From  list.  The  "from  list"  defines  the  graph  of 
a  compound .  (see  page  99). 

5a.  Record  Mark.  This  character  marks  the  end  of  a  logical  record. 

^b .  (3  Is  "F2").  Element  list.  This  list  gives  the  node  values  for 
the^previous  FI  record.  (See  page  100), 

5b.  Record  Mark, 

ta.  (3  Is  "F3").  Bond  list.  This  list  gives  the  connection  values  for 
the  last'  previous  FI  record.  (See  page  101). 

5c.  Record  Mark. 

^d*  Q  l5  "Fj(")-  Registry  Number.  A  nine-position  field  containing 
the  CAB  Registry  Number  assigned  to  the  compound.  (See  page  88). 

5d.  Hydrogen  Count.  A  throe-position  field  indicating  the  number 
•the  hiunber  of  hydrogen  atwms  in  the  compound . 


6d.  Abnormalities  Count.  This  three-position  count  indicates 
the  length  of  the  abnormalities  present. 

7d.  Text  length.  A  two-position  field  indicating  ^he  length  of 
the  textual  descriptor. 

8d.  Abnormalities.  A  variable  length  fxeie  ontam  <g 
and  the  textual  descriptor  (See  page*,  iv*  an*  S'j)- 

9d.  Record  Mark, 


r* 


l>,  H.  Leighner  1/12/66 


From  List 


CHEMICAL  ABSTRACTS  SERVICE 
Format  Layout 


Format  Type  (T,C,D,  or  K) 


Format  Number 


tecord 

length  FI  From  List 

Ring  Closure  List 

Numeric 


Numeric 


n+lQ  I  3n+15  I  Numeric 


Numeric 


DATA  FIELD 


Numeric 


FI 


REMARKS 


Record  character  count 


Record  identification 


Length  from  position  8  to 


ring  closure  list.  Blank  U  no  rings.: 


Plus  zone(+)  if  discontinuity 


From  attachment  of  second  atom. 


From  attachment  of  nth  atom 


Blank  or  record  mark  |Blank-ring  closures  follow 


Record  Mark  »  no  ring  closures 


First  ring  closure 


Last  ring  closure  (mthj 


End  of  record 


Record  mark 


Format  Layout 


Format  Type  (T,C,D,  or  K) 


Format  Number 


Record 


nd  list 


bonds 


Positions 


To 


n+6  n-^6 


n  +7  n  +7 


n+6  n+8 


DATA  FIELD 


Numeric 


Alphanumeric 


Alphanumeric 


Alphanumeric 


Alphabetic  or 


Alphabetic 


n+m+6  n+m+6  I  Alphabetic 


REMARKS 


Record  character  count 


Record  identification 


First  bond  type 


Second  bond  type 


Last  bond  type  (nth) 


First  ri 


or  end  of  record  (  ) 


Second  ring  closure  bond  type 


Lasv  •  .ng  closure  bond  type  (mth) 


n+m+?  n+m+7 


End  of  record 


Record  mark 


Modification  List 


Modification  List  in  F4  Record 


The  modification  list  is  preceded  by  a  bit  switch  in  position 
24  which  indicates  which  modifications  are  present: 


1  bit  on: 

2  bit  on: 
4  bit  on: 

AB  bit  on: 


mass  citations  present 
charge  citations  present 
fractional  coefficients  present 
special  segments  px*esent 


Up  to  five  subfields  may  be  present  in  the  modification  list. 

They  are:  (1)  valence,  (2)  mass,  (3)  charge,  (4)  fractional  coefficients, 
and  (5)  special  segments,  and  if  present  must  appear  in  that  order. 

These  subfields  have  the  following  format : 

Vaaab. ,  .Kaaaddd. .  .Ca&ae.  .  .Fgxxxx.  .  .Shhhxxxx9d  jKLmaaojaaac. . . . 


Where  1 


V  - 
aaa  - 
b  - 
M  - 
ddd  - 
G  - 
e  - 
F  - 
g  - 

xxxx  - 

S  - 

hhh  - 

dd  - 

K  - 
L  - 
mmm  — 

0  - 


valence  subfield 
atom  number 
valence 
mass  subfield 
mass  value 
charge  subfield 
charge  value 

fractional  coefficient  subfield 
fragment  number 

fractional  coefficient  (two  digit  numerator 

and  denominator) 

special  segment  subfield 
Tgment  number  (ID) 
element  symbol 
no.  of  hydrogens  attached 
actual  valence 
mass  citation 
charge 


K.  L.  Weiaenberger 


DOT  MOLECULAR  FORMULA 


The  dot  molecular  formula  is  composed  of  summation  molecular 
for  each  disconnected  fragment  in  the  structure 

Associated  with  each  of  the  molecular  formula  x ragments  except  tbt 
first  is  a  coefficient  which  defines  the  number  01  occurrences  of  that 
fragment.  The  coefficients  of  the  fragments  are  normalized  so  that  tbt  xtrst 
fragment  has  a  coefficient  of  one. 

The  following  are  the  three  types  of  molfora  fragments  and  the  tv.'.s* 
for  ordering  within  each  fragment: 

A.  Carbon  containing  fragments:  (l)  carbon,  (2)  hydrogen,  (3'  elament 
symbols  in  ascending  alphabetical  order. 

B.  Non-carbon  containing  fragments:  (l)  element  symbols  in  ascending 
alphabetical  order.  Preferences  are  imposed  here  to  change 

to  HgSO^ ,  etc. 

C.  Single  atom  fragments:  (l)  hydrogen,  (2)  element,  symbol. 

The  full  dot  molecular  formula  is  generated  from  the  fragments  b>  assigning 
a  preference  to  each  fragment  bailed  on  (l)  high  carbon  count*  (2)  high  hydrogen 
count,  (3)  law  alphabetical  element  symbol,  (4)  high  element  count.  Single 
atom  fragments  always  appear  last. 


r>s 


✓ 


FORMATS 


1.  Carbon  containing  fragments:  Carbon  and  hydrogen  have  four  position 
counts,  all  hetero  atoms  have  three  position  counts. 

2.  Non-carbon  containing  fragments :  All  hetero  atoms  have  three  position 

counts. 

3.  Single  atom  fragments:  These  fragments  have  a  fixed  format. 


Hydrogen  count 


Fractional  coefficient 


Element  Symbol 


/ 


V 


Number  of  hydrogens  to  be  subtracted 
when  generating  a  summation  moljorm. 


4.  Fractional  Coefficients: 


|TFggC<2 

Decimal  /  ^Fractional 
Representation  Representation 


5 .  Unknown  Coefficients: 


Decimal  f  ^Fractional 
Representation  Representation 


EXAMFLES 

1.  ^19^28^2  * 

Cju’0019  CHV#28C)Jb<^2  *  )rtI(?^S^)10b^4>))^9991X01 

2.  C3KL2R06P3.3Na 

.  bH^^Na^lbl^3<H^301 
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APPENDIX  D 

PRINCIPAL  CIDS  FORMATS 

V 

As  an  aid  to  the  reader,  a  collection  of  the  most  important  formats 
appearing  in  this  report  is  presented  here.  In  addition,  a  few  formats  are 
provided  which  do  not  appear  in  the  text. 
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CASFMT  OUTPUT  FORMAT 


Word 

0 

1-2 

3-ra 

m+l-n 


Contents 

Bits  0-17:  Number  of  rings  in  structure 

Bits  18-35:  Number  of  words  in  connection  table 

CAS  registry  number  (9  characters,  right- justified) 

Connection  table  (CIDS  format) 

Abnormality  table  (if  any-last  word  is  0) 


X,  B,  E  LISTS  (INPUT  TO  CONVRT) 


The  input  connection  table  consists  of  three  liats:  X,  B,  and 
E,  in  which  each  atom  and  its  connections  are  described  in  eicht-word 
blocks.  The  first  eight  words  of  each  array  is  allocated  to  atom  1, 
the  next  8  words  to  atom  2,  etc.  Each  eight  word  block  in  the  X  list 
contains  the  atom  numbers  for  up  to  eight  connections  from  that  atom, 
right-adjusted  in  consecutive  words.  The  corresponding  words  in  the  B 
list  contain  the  bond  type  of  the  connection,  right-adjusted.  In  he  E 
list,  the  first  word  of  each  group  of  eight  contains  the  element  kind 
for  that  atom,  right-adjusted  in  BCD.  In  addition,  bit  17  is  set  to  1 
for  each  word  of  the  E  list  corresponding  to  an  entry  in  the  X  list. 

If  the  connection  is  a  ring  connection,  the  corresponding  E  word  is  set 
minus.  In  the  example  below,  the  X,  B,  E  representation  is  shown  in 
octal,  with  leading  zeros  omitted. 


K 


1 


N 


2 


X 

3 

0 

0 

0 

0 

0 

0 

0 

3 
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0 

0 

0 
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0 
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0 

0 

0 

n 
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0 
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0 
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0 

0 

l 
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0 

0 

n 

0 


ft 

0 


i) 


E 

1006042  41 

0 
0 
0 
0 
0 
0 
o 

1006045  «2 

0 
0 
0 
0 
0 


100602-5  *;3 

1 OOOOAC 
0 
0 


1 1 


INTERNAL  CONNECTION  TABLE  (OUTPUT  OF  CONVRT) 


The  connection  table  is  divided  into  three  parts:  the  connection 
segment,  the  bond  index,  and  the  bond  segment.  In  addition,  the  first 
word  of  the  C.T.  is  an  index  to  the  three  parts.  The  address  (bits  21- 
35)  of  the  in^ex  word  contains  the  relative  location  of  the  bond  index 
segment;  the  decrement  (bits  3-17)  contains  the  relative  location  of 
the  bond  segment.  This  is  illustrated  below: 


OOOOOxOOOOOy 

1 

2 


y- 


l 


y 


x-l 


X 


Index  Word 


Connection  Segment 


Bond  Index 


Bond  Table 


Connection  Segment; --In  the  connection  segment,  carbon  atoms  with 
exactly  two  attachments  are  not  explicitly  stored.  Every  other  atom 
in  the  C.T.  is  stored  as  follows; 


1st  word: 

Bits 

s 

1 

2-5 

6 


Contents 

0 

=  1  if  atom  is  in  a  ring 
=  0  otherwise 

No-  of  connections  to  this  atom 

=  1  if  1st  connection  is  part  of  a  ring 
=  0  otherwise 


7-11 


Path  length  to  1st  connection 


12-17  Atom  no.  of  1st  connection 

18-29  Element  kind  in  BCD,  right- justified 

30-35  Node  type 

2nd  word:  (if  necessary) 

s  1 


1-11 

0 

12 

[  =  1  if  3rd  connection  is  part 
[  =  0  otherwise 

Of 

a  ring 

13-17 

Path  length  to  3rd  connection 

18-23 

Atom  no.  of  3rd  connection 

24 

f  =  1  if  2nd  connection  is  part 
[  =  0  otherwise 

of 

a  ring 

25-29 

Path  length  to  2nd  connection 

30-35 

Atom  no.  of  2nd  connection 

3rd,  4th  words:  (if  necessary) 

Same  format  as  2nd  word  for  the  remaining  connections. 

Bond  Index:  The  bond  index  serves  the  purpose  of  location  entries 
in  the  bond  table  corresponding  to  each  atom  in  the  connection  segment. 
The  format  is: 

Word  1: 

Bits  Pelative  location  of  Bonds  for 

30-35 
24-29 
18-23 
12-17 
6-11 
a -5 

Word  2-  (if  necessary) 

30-35  Atom  8 

24-29  Atom  9 


Atom  2 
Atom  3 
Atom  4 
Atom  5 
Atom  6 
Atom  7 
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The  table  continues  for  as  many  words  as  necessary  to  provide  an  entry 
for  each  atom.  The  last  entry  gives  the  relative  location  of  the  word 
following  the  last  word  of  the  bond  table. 

Bond  Table:  The  bond  table  consists  of  a  number  of  groups  (one 
group  for  each  atom)  of  bond  entries.  The  location  of  the  beginning 
of  each  group  is  specified  by  the  bond  index  table.  Each  word  of  a 
given  group  represents  the  bonds  in  a  path  from  the  given  atom  to  an¬ 
other  atom.  In  the  form  of  a  string  of  three-bit  digits,  each  of  which 
represents  the  bond  type  of  one  segment  of  the  path.  The  rightmost  six 
bits  of  each  word  contains  the  number  of  the  atom  to  which  the  string 
is  connected.  For  a  path  of  length  greater  than  10,  the  bond  string 
is  continued  in  the  next  word  where  bits  30-35  are  set  zero. 

As  an  example,  the  octal  representation  of  the  connection  table 
as  formatted  by  CONVRT  for  the  following  compound  is  shown  below: 


000016000014 

234601602302 

#1 

1024601 

030101602302 

#2 

1040103 

010102604601 

#3 

020102604601 

#4 

305 

030304604501 

#5 

1070106 

0  j.0105604601 

#6 

010105604601 

#7 

151411070603 

16 

44444401 

#1 

44444401 

102 

101 

#2 

203 

104 

202 

#  3 

102 

#4 

12105 

12104 

#5 

206 

207 

205 

#6 

205 

in 

Index  Word 

■\ 


Y 


Connection  Segment 


Bond  Index 


S-  Bond  Table 


J 
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ABNORMALITY  TABLE 


The  abnormality  table  is  a  series  of  wordB,  where  each  word  gives 
information  about  one  atom  which  has  abnormal  mass  or  valence  or  has 
a  charge  on  it. 


Bits 
(S  *  1,2) 


Contents 

Type  of  abnormality 
101=charge 
110=mass 
lll=valence 


(3-17) 

(18-35) 


Atom  number 

Value  of  abnormal  mass,  abnormal 
valence,  or  signed  charge.  The 
sign  of  a  signed  charge  is  indicated 
by  bit  18. 


A  word  of  zeros  follows  the  last  abnormality  word. 
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MOLECULAR  FORMULA  FORMAT 


WORD  0 


0  1-2  3 


■17  18 


►26  27 


35 


Total  no.  words  in 

Multiplier  of 

Multiplier  of 

molform  block 

Hill  parent 

water  if  a 

u 

— 

including  header 

if  a  hvdiaLe 

hydrate 

- "'>»  - 

Set  to  1  If  addend 
mol  form  exists 


|  4  Bits 

4  Bits 

_ 

numerator 


denominator 


"1  if  multiplier 
is  a  fraction; 

0  if  an  integer. 


WORD  1 


Set  if  indefinite 
polymer 


If  it  is  not  a  fraction,  multiplier 
8  bits,  right  justified. 


1 1**3  4  — - »-10  n-*— *-17  18  — - *-26  27— - *-34  35 


Number  of 

Number  of 

Number  of 

Number  of 

oxygen 

nitrogen 

hydrogen 

carbon 

atoms 

atoms 

atoms 

atoms 

T 


No.  words  in 
mol.  formula 


(Hill) 


flUs 


0— - 12  — *-17  18  — - *-29  30  —-1*35 


Element 

Number  of 

Element  kind 

Number  of 

WORD  2-M 

kind 

atoms  of 

(BCD) 

atoms  of 

(Hill) 

(BCD) 

e lement 

element 

0— - *►  8  9  — - *- 17  18  — - *-26  27— - *>35 


Multiplier 

Mul’  iplier 

Multiplier 

Multiplier 

WORD  M+l 

of  first 

of  second 

of  third 

of  fourth 

addend 

addend 

_ . 

addend 

addend 

FORMAT  OF  MULTIPLIER  SAME  AS  WORD  0  MULTIPLIER 


WORD  M+2  SAME  AS  WORD  1  --  but  for  addend  molform 

WORD  M+3  SAME  AS  WORD  2 — M  --  but  for  addend  molform 
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CIDS  RECORD  FORMAT 


The  output  of  ADDMF  la  an  IOBS  type  2  tape.  Each  compound  record 
ia  a  logical  record.  These  are  grouped  into  physical  records  of  1000 
words  or  less.  The  CIDS  record  format  follows: 

Word  Bits  Contents 


1 

(  3-17) 
(21-35 ) 

2*8  C  0  words  preceding  Addit.  Reg.  No  ) 
2's  C  0  words  in  logical  record) 

2 

(  3-17) 

(2  1-35) 

2'a  C  0  words  preceding  Aonurmality  Tarle) 
2' 8  C  0  words  preceding  C.'l.) 

3 

(  3-17) 
(21-35) 

2's  C  0  word9  preceding  References) 

2's  C  0  words  preceding  S.F.I.) 

4 

(  3-17) 
(21-35) 

2's  C  0  words  -receding  Keys) 

2's  C  0  words  p.eieding  Qualifiers) 

6 

Primary  Registry  Number  BCD) 

7 

Mol  Font- 

m 

Add  iiio  ;.r»  I  degisvr,  N'uinbrr 

• 

n 

Structure  (C.T.) 

• 

o 

Abnormality  Table  (if  any) 

• 

P 

Structural  Formula  Image  (if  any) 

q 

Reference  (if  any) 

r 

Qualifiers  (if  any) 

s 

Keys  (2  words  per  key) 

Note  that  several  of  the  data  blocks  will  be  empty.  The  pointers  to  these 
blocks  will  point  to  the  location  where  the  data  would  be  stored  if  present. 
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MF  SORT  KEY 


The  four  word  MF  Sort  block  Is  attached  at  the  beginning  of  each  com¬ 
pound  record  prior  to  Registry  so  that  the  records  can  be  sorted  in  Hill 
formula  sequence.  The  order  of  preference  is  C,  H,  followed  by  the  other 
elements  in  alphabetical  order. 


Word  1: 

Bits 

S ,  1-8 
9-17 
18-29 
JO-35 


Contents 

No.  carbon  atoms 
No.  hydrogen  atoms 
1st  element  (2  BCD  characters) 
No.  atoms  of  1st  element 


Word  2: 

S  ,  1-11  2nd  element  (2  BCD  characters) 

12-17  No.  atoms  of  2nd  element 

18-29  3rd  element  (2  BCD  characters) 

30-35  No,  atoms  of  3rd  element 


Word  3  4? 

Same  format  at  Word  2  Cor  remaining  elements.  Unused  words  are 
set  zero. 


-'72 
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INPUT  TO  CHEMTYPE 


INPUT  to  the  CHEMTYFE  sy3tem  Is  a  magnetic  tape  with  the  chemical 
typewriter  characters  packed  as  follows: 


0-W3  4-« - ►11  12—15  16  «• - ►23  24**-27  28  «• - ^35 


Physical  records  are  300  7040  words  long  (900  characters).  If  a  paper 
tape  ends  before  the  end  of  a  physical  record,  the  record  is  filled  out  with 
zeroes.  Each  paper  tape  record  is  followed  on  magnetic  tape  by  a  file  mark. 
The  final  paper  tape  on  a  magnetic  tape  appears  as  follows: 

(1)  paper  tape  characters 

(2)  physical  record  filled  with  zeroes  after  end  of  paper  tape 

(3)  file  mark 

(4)  a  physical  record  containing  all  7's 

(5)  a  second  file  mark 


8  7  -«■ - —  -  +  -  ’  3  2  1 


MERGENTHALER  COORDINATE  PITJCH  CODE  (BINARY) 


i 

4 


Data  card  input  Indicating  which  compounds  on  a  paper  tape  are  to  be 
deleted  is  formatted  as  follows: 


(Left  Justified) 


CHEKTYPE  INTERNAL  FORMATS 


MATRIX  CONTENTS  Ax’  END  OF  INPUT  PROGRAMS: 


0  l-m - fc-26  27  28  29  - m-33 


1 

Case 

Input  char- 

1 

bits 

acter  with- 

1 

out  parity 

■ 

bit 

1  for  underlined 

x— — 

character 

SUB 

01 

Mergenthaler 

LCMER  - 

10 

Code 

UPPER  = 

11 

MATRIX  CONTENTS  AT  END  OF  ORC-NZR: 

20 

30' 

m 

_ ^3S 

BXBRAK  -  table  of  x  coordinates  of  right  hand  brackets  in  structure 


last  x  coord  is  =  DELX 


MULTAB  AT  END  Or  ORGNZR: 


18-*21  22  — . -  ■  - . P-35 

Multiplier 


last  multiplier  is  ■  to  1 


MULTAB  AT  END  OF  REGRUP: 


3  -a- . . p  17  18**20  21«* - P-35 

□ 

Multiplier 

2 

.  ^ 

!'s  corap.  pointer  to 

last  entry  in  SCRUB 
list  for  this  Mul¬ 
tiplier  (and  this  sat 
of  brackets) 


XrAB  -  &  table  formatted  a*  follows: 


-**►17 


24  •«“ 


-*=-35 


Total  atoms  of  this 

Element 

element 

if  a  single  element, 
the  second  character 
is  a  BCD  blank 
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CT  -  tne  internal  Connection  Table  whose  entries  are  formatted  a3 
follows : 


3-i - s-17  18-20  21-23  24-- - n-29  30— - *-35 


■ 

Number  of  atom  bonded  to 

Multi- 

Bond 

2nd 

letter 

1st 

letter 

■ 

plier 

type 

of 

atom 

of 

atom 

The  multiplier  points  to  an  entry  in  the  list  of  bracket  multipliers 
(MULTAB,  described  in  Section  2. 2. 7. 2)  where  applicable.  Each  atom 
has  8  such  entries  only  the  first  of  which  contains  the  atom  name. 

The  second  and  third  contain  the  relative  matrix  location  of  this  atom 
as  follows: 


-17  18-22  21-23  24 


-27  28 


SAME  AS  ABOVE 

Bond 

zeroes 

T 

| 

type 

J 

Word  2  for 

Jr  a  given  atom 
3  Low  order 
digitB  o£ 
relative 
matrix  location 


■17  18  -  20  21-23  24 


-29  30- 


SAME  AS  ABOVE 

Bond 

type 

zeroes 

t  .  — 

rWord  3  for  a 
given  atom: 

2  high  order 
digits  of  rel¬ 
ative  matrix  location. 


ex.  relative  matrix  location  14321  is  represented  as: 


CHEMTYPE  OUTPUT  RECORD  FORMAT 


S*—2  3 


■17  18~20  21- 


•29  30- 


WORD  0 


WORD  1 

WORD  2 

WORD  3 

WORD  4 

WORD  5 
WORD  6 


No.  of  words  in  block 
not  including 

Word  0 

010 

f(bcd) 

2's  complement  of 
first  location  of 
MOLTAB 

2's  complement  of  1st 
word  of  connection 
table 

2's  complement  of  first 
location  of  abnormality 
table 

(0  if  not  present) 

2's  complemen*  of  first 
location  of  nomenclature 

i 

_ J 

2's  complement  of 
first  location  of 

SCRUB  list 

, 

’tfumbt**  -  :  R 

L_ . 

'NO- 

..  - - 

REGISTRY  NUMBER  (first  6  characters. 


REGISTRY  NUMBER  (second  6  characters) 


CLSTER  (Classification  and  Stereo) 


MOLTAB  BLOCK 


CONNECTION  TABLE  BLOCK 


ABNORMALITY  TABLE  BLOCK  (if  present) 


NOMENCLATURE  &  REFERENCE  BLOCK 


S**2  3  ■*"►5 


■17- 


-25  26 


Charges 

outside 

of  brackets  (total) 


SFI  BLOCK  (SCRUB  LIST) 


-34  35 


SFI 

_ 

+ 

DELY 

DELX 

Header 

A 


The  output  tape  which  is  created  by  this  system  consists  of 
variable  length  records,  each  record  consisting  of  a  single  chemical 

record . 


Locations 

r  to 

WORD  0 


Set  if 
underline 
table  ex:sts 
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CHEMTYPE  DATA  FORMATS 


FORMAT  OF  ULSTER: 


15' 


■17 


30 


■35 


Stereo 

Classifi- 

cation 

1  «  Non-Stereo 

2  =  Stereo 

3  a  Stereo  Unidentified 
0  =  not  present 


1  =  unclassified 

2  =  blank 

3  =  classified 


SCRUB  LIST  FORMAT: 


3 


■17 


20 


30' 


-**•35 


Relative  matrix 
loc. 


Modified 

dura 

code 


* 

Case  bit 
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NOMENCLATURE  FORMAT: 


* 


Each  nomenclafre  entry  (one  line  of  nomenclature)  is  delimited 
by  a  7778  charactei  *om  the  next.  The  last  nomenclature  word  is 
filled  out  with  zeroes.  References  are  separated  from  each  other  by 
a  777e  character  and  the  beginning  of  the  reference  field  is  pre¬ 
ceded  by  a  triple  bond  character  in  the  output  block.  The  last  word 
in  the  reference  field  is  also  filled  with  zeroes. 


3-a - *- 17  21 - *-35 

Header  of 
nomenclature 
block 


2's  comp,  of  number 

2’s  comp,  of  total  number 

nomenclature  words 

of  words  in  block 

0  -i - *-8  9  — - *-17  18 - - ►26  27  - *-35 


i 


4  characters 
per  7040 
word 


6  Bit 

Dura  Code 

7T 


9  Bit 


Superscript 


\ 


•  Dura  case  bit 


Underline 


character 
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REFERENCE  BLOCK  FORMAT 
(OUTPUT  OF  UPTAP) 


3-17 

18  -  20 

21  -  35 

0 

No.  of  words  in 
table  of  contents 

No.  of  words  in  reference 
block  (including  Word  0) 

1 

1* 

RA  to  CLSTER  Block 

2 

2** 

3* 

RA  to  Nomenclature  Block 

3 

3** 

0* 

RA  to  EA  Number  Block 

4 

CLSTER 

5 

• 

Nomenclature 

• 

X 

EA  Number  (S) 

NOTE: 

*  Type  of  Data 

0  BCD 

1  Binary 

2  Modified  Dura 

3  Compressed  Modified  Dura 

**  i f  Decrement  is  zero 
no  data  is  stored 
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ORDER  OF  DATA  CARDS  FOR  CIDS 
FRAGMENTS  AS  INPUT  TO  SLOAD 

REPLAC002000 

C  (Title  card) 

(Hydrocarbon  Radicals) 


REPLAC001000 

CNOP  (Title  card) 

(Functional  Groups  Containing  C,N,0,P) 
CNOS 

(Functional  Groups  Containing  C.N.O.Sl 


(Rest  of  Functional  Groups) 


000000 


(End  card) 


the  £SU;Stlhfot~Uo“",CtUr“1  £r*S™nt  C°n,1’t’  °f  d“‘  C"‘1* 

Key  Number 

Molecular  Formula 
Connection  Table 
Abnormalities  (if  any) 

*!  &J>  character  number  punched  in  the  first  6  columns  of  the 
card.  The  formats  for  the  mol  form  and  C.T.  cards  can  be  found  in  CIDS  No 

card6nou6t  J65)  ^  exception  that  colun,n  2i>  of  the  first  mol  form 

card  now  indicates  whether  any  abnormalities  are  present.  Each  abnormality 

and  Z  is  si"2  %  ”i!Qr\X  13  the  abnormality  type,  Y  is  the  atom  number, 

* is  bhe  ™lue  of  the  abnormality.  The  abnormality  types  are  V  (Valence' 
C  (Charge),  M(Mass).  Examples:  V1=5.C1=+1.M5=14.  iva.ence, 

The  data  for  the  next  fragment  follows  immediately  except  when  control 
cards  are  needed  to  separate  groups  or  blocks. 
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OUTPUT  OF  SLOAD 


The  output  of  SLOAD  is  a  tape  on  which  the  first  physical  record  con 
tains  the  fragment  data.  The  format  of  the  data  associated  with  each  fra 
ment  is  stored  as  follows: 

Word  Contents 

1  B=No.  of  words  in  fragment  record 

A=No.  of  words  preceding  structure  (m) 

2  D=No.  of  words  preceding  abnormality  table 

(=0  if  no  abnormalities) 

A=No.  of  words  in  structure 

3  Molecular  formula 


in  Key  number 

ra+1  Connection  table 


n  Abnormality  table  (if  needed) 

(a  zero  word  follows  last  entry) 


The  index  is  written  as  the  second  record  on  the  output  tape. 

The  decrement  of  the  first  word  of  this  record  contains  the  complement 
of  the  number  of  words  in  the  index.  This  word  is  to  be  read  -tnfo  a  lo» 
cation  CROSCT,  immediately  preceding  CROSS  when  the  tape  is  read  for 
screen  assignment. 


» 


INTERNAL  KEY  FORI-iATS 

Each  of  the  CIDS  keys  occupies  two  computer  words.  The  first  3 
bits  of  the  first  word  gives  the  key  type,  enabling  the  programs  which 
process  these  keys  to  interpret  the  information  in  the  remainder  of 
the  two  •■-■rds.  The  present  CIDS  keys  have  the  following  formats: 

TYPE  0:  Structural  Fragment  Keys 


S.1,2 

3-35 

S ,  1-34 

35 

000 

BCD  Code 

"0j 

L  -  -  .  _ _ -  - 

jJ 

Bit  35  (  -  1  if  fragment  is  attached  to  a  ring 
Word  2  )  «  0  if  fragment  is  not  attached  to  a  ring 


Example:  Key  l-A-3  is  stored  internally  as  the  following  (i'  BCD); 


3,1-35 


S , 1-35 


I 

I] 

A 

3 

I 

3 

E 

3 

3 

3 

3 

TYPE  1:  Skeleton  Molecular  Formula 


3,1,2 

3 

4-10 

11-17 

18-23 

24-29 

30-35 

S  ,1-5 

6-11 

12-17 

18-23 

24-29 

30-35 

001 

B 

C 

N 

0 

S 

P 

Code 

Amt 

Code 

Amt 

Code 

Amt 

1 

.  . 

1 

1 

2 

2 

3 

3 

In  this  key,  the  number  of  atoms  of  C,N,0,S,P  are  stored  in  fixed 
positions  in  Word  1,  Codes  from  a  table  are  stored  in  Word  2  for 
other  elements  occurring  in  the  nucleus,  followed  by  the  number  of 
atoms  of  that  element.  Up  to  three  of  these  "other  elements"  may 
occur  and  they  are  stored  in  alphabetical  order. 

*Types  1,  2,  3:  Bit  3  of  word  1  (  =  0  if  2nd  word  is  unused 

)  <=  1  otherwise 
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Example:  Key  "SKELMF  C12  Nl  Sbl"  Is  stored  internally  as  the  following 
binary  representation: 


S , 1 , 2  3  4-10  11-17  18-35  S , 1-5  6-11  12-35 


001 

1 

0001100 

0000001 

o 

1 

o 

Code 

000001 

0  - 

-  0 

for  Sb 

TYPE  2;  Ring  Molecular  Formula 

This  key  is  stored  in  the  same  format  as  type  1  except  that  bits  (3,1,2) 
of  word  1  contain  010. 

TYPE  3:  Redundant  Numerical  Ring  Population 


S , 1, 2  3  4-7  8-11  12-15  ... 


32-35 


Oil 


*  R1 


R2 


R3 


R4 


R5 


R6 


R7 


R8 


S-3  4-7  ...  32-35 


1  R9  ! 

RIO 

Rll 

R12 

R13 

R14 

R15 

R16 

R17 

The  count  of  atoms  in  each  ring  of  a  nucleus  is  stored  in  ascending 
order  in  consecutive  4-bit  blocks  in  the  key.  If  a  ring  contains 
more  than  15  atoms,  0001  is  stored  as  the  atom  count. 

Example:  Key  "RNRP  5,6,10"  is  stored  internally  as  the  following 
binary  representation: 


S.1,2 

3 

4-7 

8-11 

12-15 

16-35 

S , 1-35 

011 

Q 

0101 

|  0110 

1010 

0  -  0 

0  - 

—  0  | 

*Types  1,  2,  3:  Bit  3  of  word  1  f  =  0  if  2nd  word  is  unused 

(=1  otherwise 
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TYPE 4: 


Counts 


S,l,2 

3-6 

7-35 

S , 1-35 

100 

Sub- 

type 

Count 

0  -  0 

Tills  format  19  used  to  represent  a  variety  of  keys  which  are  counts 
The  subtype  field  identifies  the  particular  feature  that  is  being 
counted. 


Subtype  Count  of: 


0 

1 

2 

3 

4 

5 


Rings  in  nucleus 
Double  bonds  in  nucleus 
Nuclei  in  Molecule 

Total  number  of  direct  attachments  to  all  nuclei 
Double  bonds  between  C  (acyclic) 

Triple  bonds  between  C  (acyclic) 


6 

7 

8 

9 

10 


C-C-C  configurations  regardless  Df  bonding  (acyclic; 

1 

C 

C— ^-C  configurations  (acyclic! 

C 

[cl  -El  count  of  carbons  connected  by 

single  bonds,  any  configuration. 

El— [c]  —  El  count  of  carbons  in  unbranched 

chain  (single  bonds) 

Total  ring  count  (Summed  over  all  nuclei  in 
parent  and  addends,  if  any) 


Example!  Key  "Number  2  3"  or  "Nuclei  in  Molecule  “  3"  is  stored 
internally  as  the  following  binary  representation: 


S.1,2 


3-6 


7-35 


S ,  1-35 


100 


0010 


Oil 
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TYPE  5 :  Subtype  0:  Molecular  Formula  Ke> 


S ,  1 , 2  3-5  6-17 

101  000  Element 

(BCD) 


18-35 

Count  or  0 


S , 1-35 
0  - 


This  count  of  the  number  of  atoms  In  the  Hill  formula  is  gi’en  for 
elements  C,H,N,0,P,S,F,Cl,Br,I,Si,B.  For  other  elements,  0  is  stored 
instead  of  the  count. 

Example:  Key  "M0LFRM  C  10"  is  stored  internally  in  the  following 
binary  representation: 

S ,  1, 2  3-5  6-17  18-35  S.l-3;- 

101  001  110000010011  0  —  0101 0  0 - 0  ] 

TTT’Tbcd^ 

TYPE  5:  Subtype  1:  Nonspecific  Functional  Group 


SI, 2  3-5  6-23  24-35 


101 

001  0  — 

—  0  Element 

(BCD) 

S , 1-35 
0  - 


Example:  Key  'NONSPCAS1  is  stored  internally  in  the  fellowin, 
binary  representation: 


S ,  1, 2  3-5  6-17 _ 24-35 _ S,  1-35 

101  001  0  -  0  010001110010  |  0  —  -  ■ 

As  in  BCD 


LIST-STRUCTURED  FILE  GENERATION 


Programs  NUFILE,  KEYSRT,  MERGE,  and  INDEX  together  create  or  update 
the  search  file  and  form  the  inverted  key  index.  The  formats  and  a 
brief  description  of  the  final  output  tapes  from  this  system  are  listed 
here  for  easy  reference. 

( 1 )  The  Tape  Search  File 

The  compound  records  in  the  Tape  Search  File  are  blocked  in 
variable  length  physical  records  whose  maximum  size  i3  1000  words.  A 
compound  record  is  always  entirely  contained  within  one  physical  record. 
The  information  on  each  tape  in  the  Search  File,  except  the  last,  is  ter¬ 
minated  by  an  end-of-file  mark  followed  by  a  small  (10  word)  dummy 
block.  The  last  tape  of  the  File  is  terminated  by  two  consecutive  end- 
of-file  marks  and  a  special  ten  word  block  containing  information  which 
NUFILE  uses  when  updating  the  File.  The  first  word  of  the  special  block 
contains  the  address  which  will  be  assigned  to  the  next  compound  to  be 
added  to  the  File.  The  second  word  contains  the  total  number  of  com¬ 


Contents 

Tape  Number  (1-  ) 

Record  Number  (0-  ) 

Relative  \ddress  (0-  ) 

Number  of  Compounds  in  the  file 
(right  adjusted) 


Th<;  Compound  records  in  the  Disk  Search  File  are  blocked  in 
465  word  physical  records.  Compounds  may  be  split  between  two  physical 
records,  but  never  more  than  tvio.  The  end-of-tape  and  end-of-file 
sentinels  are  the  same  as  for  the  Tape  Search  File,  except  that  the 
special  ten-word  record  at  the  end  of  the  File  contains  the  following 
information: 

Word 

1 

2 

record  v;ritteri  on  the  tape  (right 
adjusted) 


B;  ts  Contents 

(S.l-17)  Record  (track)  Number  (1-  ) 

(18-35)  Relative  Address  (0-464) 

Number  of  unused  words  in  last  data 


pounds  m  the  File. 


Word  Bits 

1  (S , 1-5) 
(6-18) 
(19-35) 

2 


(2)  The  Disk  Search  File 
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(3)  Llat-of  Addresses  File 


The  address -lists  on  this  file  are  blocked  in  variable  length 
records  whose  maximum  size  is  1000  words.  The  data  on  each  tape  except 
the  last  is  terminated  by  one  end-of-file  mark  followed  by  a  ten-word 
dummy  record.  Two  consecutive  end-of-file  marks  terminate  the  last  tape 
In  the  file.  Each  address-list  (i.e.,  all  the  Search  File  addresses 
corresponding  to  a  single  key)  is  followed  by  a  word  of  zeros,  and  the 
addresses  comprising  these  lists  are  in  ascending  order. 

The  format  of  an  address  in  the  Index  for  the  Tape  Search  File  is; 


Bits 


Contents 


(S, 1-5) 
(6-18) 
(19-35) 


Tape  No.  (1-  ) 

Record  No.  (0-  ) 
Relative  Address  (0-  ) 


The  format  of  an  address 


in  the  Index  for  the  Disk  Search  File  is: 


Bits 


Contents 


(S ,  1- 17) 
(18-35) 


Track  No.  (1-  ) 

Relative  Address  (0-464) 


(4)  The  Key-Address  List  and  INDX 

The  Key-Address  List  contains  each  key  (2  words)  coupled  with 
the  address  of  its  corresponding  list  on  the  List-of-Addresses  File 
(1  word).  The  format  of  this  address  is: 

Bits  Contents 

(S,l-17)  Track  Number  (0-  ) 

(18-35)  Relative  Address  (0-464) 

This  data  is  blocked  In  465  word  physical  records.  Since  each  logical 
record  is  three  words,  there  can  be  as  many  as  155  keys  per  block  (or 
track).  The  last  key  on  this  file  is  followed  by  a  special  sentinel 
"key"  composed  of  wo  words  of  all  1  bits. 

Each  Key-Address  List  tape,  except  the  last,  is  terminated  by  an 
end-of-file  mark  and  a  dummy  block.  The  Key-Address  List  data  on  the 
last  tape  in  the  file  is  terminated  by  two  consecutive  end-of-file  marks, 
directly  followed  by  the  third  level  of  the  Inverted  Key  List,  INDX. 

INDX  identifies  the  first  key  on  each  track  of  the  Key-Address  List  in 
order  to  provide  quick  access  to  the  desired  key  list,  INDX  is  always 
small  enough  to  place  to  tape  in  one  block  (e.g.,  1000  words  would  ac¬ 
comodate  a  file  containing  over  50,000  different  keys). 


2B'» 
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Tli ft  logical  record  format  of  TNDX  is: 

Word  Bits  Contents 

1  (S,l-35)  Key  (1st  half) 

2  (S ,  1-35)  Key  (2nd  half) 

3  (S  ,1-17)  Track  (1-  ) 

The  last  key  to  appear  on  INDX  will  naturally  be  the  special  sentinel 
Since  the  last  track  on  the  Kev-Address  List  may  not  be  completely 
filled,  the  address  corresponding  to  the  sentinel  key  will  not  neces¬ 
sarily  be  word  462  on  that  track  as  it  would  normally  be  for  the  last 
key  on  a  track. 


I 

! 


I 

( 


OUTPUT  OF  MOLE 


MOLE  generates  the  following  block  of  data  containing  the  molec-  ~ar 
formula,  connection  table,  and  abnormality  table: 

Word  Contents 

1  A=No.  of  words  preceding  the  C . T ,  (X+2) 

D=Tctal  number  of  words  (X+Y+Z+2) 

2  A=No.  of  words  in  the  C.T.  (Y) 

D=  No.  of  words  preceding  the  Abnormality  Table 
(X+Y+2  or  0  if  no  abnormalities) 


3 

Molecular  Formula 

* 

(X  words) 

X+2 

Connection  Table 

(Y  words) 

X+Y+2 

Abnormality  Table 

(Z  words) 

The  molecular  formula  is  otored  in  the  same  format  as  the  Hill  for¬ 
mula  for  a  file  compound.  The  connection  table  and  abnormality  table 
has  the  same  format  as  in  the  file  compound  except  that  redundancy  has 
been  removed  from  the  C.T. 


MOLECULAR  FORMULA  IN  QUERY 


Word  of  Formula 

Bits 

Contents 

First  Word 

3-17 

Number  of  words  in  molecular  formula 

19 

=  1  if  restricted  search 

=  0  if  otherwise 

Additional 

Element  Words 

0-11 

Element  (BCD) 

12-20 

Maximum  number  of  atoms  if  range  search 
Number  of  atoms  if  exact  search 
=  0  otherwise 

21-28 

Minimum  number  of  atoms  if  range  search 
=  0  otherwise 

34 

=  1  if  search  of  atoms  is  a  range  search 
=  0  otherwise 

35 

*  1  if  search  of  atoms  is  an  exact  match 
search 

=  0  otherwise 

k 


SEARCH  SYSTEM  OUTPUT 

QUERY  NUMBER  CIDS  I 


RN 

A00005I4 

C/DS  Registry  Number 

TN 

T03603  j 

Identification  Numbers 

TN 

X 00000637  J 

in  File  of  Origin 

^15 

H|2  N2 

Molecular  Formula 

Structure 


Quinoline,  2“<p-aminophenyl)- 

TF7087  "I 

EAI698  ) 

STEREO  N  CODE  U 
DC25440 


Nomenclature 
Reference  Numbers 

Stereo  and  Security  Cfa$sificot;or 

Edgewood  Arsenal 
Document  Code 


v 


j 


APPENDIX  E 


ERROR  DETECTION  AND  ANALYSIS  BY  CHEMTYPE 


Compounds  are  rejected  during  processing  by  CHEMTYPE  as  a  result  of 
73  different  error  conditions.  Errors  are  detected  by  almost  every  individual 
program.  The  messages  and  a  description  of  the  conditions  that  cause  them 
are  described  below. 

It  is  difficult,  however,  to  have  an  absolutely  crystal  clear  inter¬ 
pretation  of  the  causes  of  the  errors.  In  many  instances,  errors  may  be 
caused  by  the  paper  tape  reader  during  the  transfer  of  the  paper  tape  image 
to  magnetic  tape.  This  may  result  in  bits  being  dropped  or  added  to  a 
character  (or  characters)  and  the  program  may  reject  the  compound  for  a  rea¬ 
son  not  ascribed  directly  to  the  paper  tape  reader.  In  other  words,  in  the 
case  of  such  errors,  a  garbled  record  results  and  the  true  reason  for  the 
error  is  not  reflected  by  the  program  printout. 

ERROR  MESSAGES 


The  following  list  gives  the  error  messages  generated  hy  the  CHEMTYPE 
system,  the  name  of  the  program  that  detects  them,  and  a  description  of  Ms 
error  itself. 

(1)  Number  of  parity  errors  since  last  compound  entered.  (TAPWRM) 

The  total  parity  errors  found  in  Mergenthaler  input  between 
two  good  compounds  is  printed  when  each  correct  compound  is 
entered  into  the  file.  This  may  be  due  to  a  typewriter  mal¬ 
function,  an  error  in  the  paper  tape  reader  in  which  a  bit  is 
dropped  or  added,  or  an  error  in  the  procedure  followed  by 
the  typist  in  correcting  a  parity  error. 

(2)  Parity  error  in  coordinate  input.  No  code  delete  found.  (TAPWRM) 

Either  a  paper  tape  read  error  occurred  resulting  in  parity  error 
or  the  typist  did  not  correct  a  parity  error  in  the  proper  way. 

(3)  Low  bit  not  punched  in  coordinates.  (TAPWRM) 

Parity  error  found  in  the  coordinates  was  a  lesult  of  the  low 
bit  not  being  punched. 

(4)  Typist  goofed  again.  Last  coordinate  word  not  equal  zero  'TAPWRM; 

IT  the  typist  hits  a  character  Loo  quickly  after  doing  a 
carriage  control  operation  which  results  in  coordinates,  the 
character  may  land  in  the  middle  of  the  coordinates  since 
each  set  oi  coordinates  consists  of  6  pinches. 
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(5)  Coordinate  bit  2  or  3  punched,  (TAPWRM) 

Punch  appears  in  coordinates  where  it  shouldn't  as  a  result 
of  a  hardware  (typewriter  punt h  error),  a  paper  tape  read 
error,  or  a  typing  error.  The  lat  •  i  is  raised  b>  1 1 ■*  pist 
responding  incorrectly  to  typewriter  parity  light, 

(6)  Unidentified  input  character.  (TAPWRM) 

Character  does  not  correspond  to  any  legitimate  Mergetithaler 
code.  May  be  due  to  paper  tape  read  error  which  does  not 
result  in  parity  error  or  to  a  mispunch  by  the  typewriter, 

(7)  Undefined  symbol  in  record.  (INPUTD) 

The  Dura  punched  an  illegal  code, 

(8)  Overflowed  MATRIX  erasing  brackets.  (ORGNZR) 

Coordinate  error  resulting  in  misplaced  characters  in  MATRIX, 

(9)  Unidentified  character  found  in  nomenclature.  (MONIKR) 

Input  information  was  in  error  and  may  have  been  mispunched. 

(10)  Overflowed  nomenclature  block.  (MONIKR) 

The  nomenclature  information  was  longer  than  400  characters. 

(11)  Unable  to  Identify  character  found  in  MATRIX.  (ORGNZR) 

Input  is  an  unintelligible  character. 

(12)  Brackets  contain  character  other  than  bond  or  corner.  (ORGNZR) 
Wrong  character  in  brackets,  due  to  mispunch. 

(13)  Input  exceeds  10000  MATRIX. (TAPWRM) 

Either  input  record  was  too  large,  or  the  typist  typed  a 
lozenge  and  then  reverse  indexed  above  lozenge  and  typed  a 
character.  The  latter  would  result  in  a  y  coordinate  larger 
than  100  when  the  y  coordinate  is  corrected  on  the  basis  of  the 
lozenge  coordinate  =1.  This  could  also  have  been  caused  by 
a  tape  reader  error. 

(14)  Compound  too  large  for  MATRIX,  (INPUTD) 

Compound  exceeds  dimensions  of  MATRIX. 

height:  100 

82 
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length: 


(15)  Tape  reading  problem.  (TAFVJRM) 

Physical  problem  occurred  while  reading  tape  resulting  in  an 
Incomplete  word  being  read  during  Mergenthaler  input. 

(16)  Magnetic  Tape  read  error,  (INPUTD) 

Magnetic  tape  reading  problem  in  Dura  Mach  input. 

(17)  First  character  found  in  MATRIX  not  number  or  letter.  (ORGNZR) 

The  first  character  of  TID  must  be  the  first  character  found 
in  MATRIX  and  must  be  a  number  or  letter. 

(18)  Space  not  found  after  12  registry  number  characters.  (ORGNZR) 
TID  is  too  big,  or  typist  did  not  sk|ip  a  space. 

(19)  No  match  found  in  table  of  classification.  (ORGNZR) 
Classification  information  is  unintelligible. 

(20)  Classification  found  to  have  more  than  8  characters.  (ORGNZR) 
Classification  information  is  unintelligible. 

(21)  Structure  extends  past  STEREO  field.  (ORGNZR) 

Typist  went  too  far  down  in  record  and  part  of  the  compound 
extends  past  the  STEREO  line  of  the  input, 

(22)  No  match  in  table  for  STEREO  information.  (ORGNZR) 

STERBO  information  is  unintelligible. 

(23)  Addend  mol form  missing.  (MOLFRM) 

The  structure  is  bracketed  and  the  addend  molform  is  net 
present;  there  is  no  charge  present  to  indicate  that  the 
structure  is  an  ion. 

(24)  Character  in  molform  wrong.  (MOLFRM) 

There  is  a  syntax  error  in  molform,  or  input  tape  was  mis- 
punched . 

(25)  Character  following  blank  in  molform  not  a  dot.  (MOLFRM) 

There  was  a  syntax  error  in  the  molform  or  input  tape  was 
mispunched . 
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i,26)  Cannot  reorder  scrub  list,  (ORGNZR) 

This  is  an  addend  but  the  bracket  is  missing. 

(27)  Addend  presents  problem.  Too  many  characters  for  list,  (REGRUP) 

There  was  elthei  an  excessively  long  addend  or  coordinates 
were  wrong  and  other  information  got  into  the  structure  by 
mistake . 

(28)  Too  many  entries  for  scrub  list,  (ORGNZR) 

Structure  consisted  of  more  than  700  characters. 

(29)  Nomenclature  field  missing.  (ORGNZR) 

No  nomenclature  field  was  found. 

(30)  Structural  formula  too  high.  (ORGNZR) 

Structure  extended  into  molform  line. 

(31)  STEREO  field  missing.  (TAPWRM) 

Typist  either  left  out  STEREO  information ,  or  somehow  erased 
the  S  as  a  result  of  making  a  correction  incorrectly, 

(32)  STEREO  not  typed  after  exit  from  S.F,  field.  (INPUTD) 

Either  program  did  not  find  stereo  text,  typist  failed  to 
type  STEREO  inf ormation ,  or  the  typewriter  mispunched, 

(33)  Platen  reversed  above  start  of  record.  (INPUTD) 

Typing  occurred  above  the  leading  wedge. 

(34)  Unknown  character  outside  brackets.  (EXCESS) 

There  was  an  input  error  due  to  a  mispunch  or  a  syntax  error. 

(35)  Small  letter  outside  of  brackets.  (EXCESS) 

Formula  outside  of  brackets  begins  with  a  small  letter  due  to 
a  syntax  error. 

(3t)  Polymer  molform  subscript  not  n,  (MOLFRM) 

Error  in  input. 

(37)  Fractional  addend  multiplier.  (MOLFRM) 

The  multiplier  of  an  addend  is  a  fraction  and  cannot  be 
handled  in  verification. 
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(38)  Nothing  found  outside  brackets  by  EXCESS,  (EXCESS) 

No  characters  found  where  they  weie  expected.  Input  error. 

(39)  Error  in  typing  structural  formula.  (CLEANM) 

Any  violation  of  the  typing  conventions  for  the  s*  uctural 
formula , 


(40)  Inadmissible  linear  string  in  S.F,  (CLEANM) 

(a)  Unexpanded  chemical  line  notation 

ex.  ““CO— 


(b)  Confusion  due  to  closeness  of  atoms 

ex.  -C1C1-  or  C1C1 


(41)  Bond  in  wrong  place.  (SEiUP) 


(42)  Picture  too  scrunched  up.  (MAKECT) 

Analysis  problem  caused  by  closeness  of  characters. 


It  is  difficult  to  determine 
which  bond  belongs  to  upper 
right  carbon. 


(43)  Illegal  symbol  around  atom.  (CLEANM) 

An  illegal  symbol  has  been  detected  in  one  of  the  eight 
locations  surrounding  an  atom. 


(44)  H  in  wrong  place.  (CLEANM) 

Unexpanded  hydrogen  connection. 


ex-  — CHa  —  or 
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(45)  Non- straight  attachment  to  carbon  chain,  (KAKECT) 
ex. 

“(C)e*~ 

({)« 

(46)  Illegal  symbol  in  structural  formula  (SETUP) 

Symbol  In  structural  formula  other  than  bond,  atom,  number, 
charge,  bracket  corner,  or  mid-line  dot. 

(47)  Reject  symbol  typed.  (INPUT) 

Typist  pressed  ®  key  to  delete  record  typed  thus  far  for 
Dura  Mach  input. 

(48)  Box  found.  Record  deleted.  (TAPWRM) 

Typist  pressed  0  key  to  delete  record  typed  thus  far  for 
Mergenthaler  input. 

(49)  Format  error  detected  by  CONVRT.  (CONVRT) 

The  compound  is  unprocessable  by  the  CIDS  system. 

(50)  More  than  19  abnormalities.  (CLEANM) 

The  Abnormality  Table  (AT)  has  too  many  entries. 

(51)  Bond  redundancy  error.  (CONVRT) 

Atom  A  is  connected  to  atom  B,  but  atom  B  is  not  connected  to 
atom  A  in  the  connection  table. 

(52)  Incorrect  symbol  in  bits  24-35  of  CT.  (NFCF) 

An  illegal  symbol  has  been  found  in  the  connection  table. 

(53)  Empty  connection  table,  (NFCF) 

There  are  no  atoms  listed  in  the  connection  table, 

(54)  Virgule  found,  fraction  not  followed  by  space.  (TAPWRM) 

Since  the  virgule  is  a  non-spacing  character,  the  typist 
must  leave  a  space  alter  a  fraction  to  allow  for  spreading 
the  fraction  apart  in  the  MATRIX. 

(55)  Illegal  character  in  MATRIX.  Program  Error.  (CLEANM) 
Character  in  MATRIX  greater  than  177. 
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(56)  Registry  number  does  not  start  with  letter. 


(0RGN2R) 


(57)  Illegal  Character  in  Registry  number.  (ORGNZR) 


Chemical  Verification  Errors  encountered  by  VERIFY: 

CV(21)  Illegal  element  was  found  in  molecular  formula. 

CV(22)  Illegal  element  symbol  was  found  in  the  Connection  Table. 

CV(23)  An  element  was  found  in  the  Connection  Table  having  a 
valance  too  high  to  be  valid  for  this  element. 

CV(24)  Molecular  formula  was  found  to  contain  no  Carbon. 

CV(25)  The  multiplier  of  the  first  addend  in  the  addend  molecular 
formula  was  founu  to  be  zero. 

CV(26)  Hydrogen  was  found  in  the  Connection  Table. 

CV(27)  The  assumed  Hydrogen  count  for  the  compound  was  different 
from  the  Hydrogen  count  in  the  molecular  formula. 

CV(30)  Illegal  unknown  attachment  was  found. 

CV(3l)  Atom  count  for  C,H,N,  or  0  in  Connection  Tabli  o -.d  r<r  ■ 
equal  that  in  molecular  formula. 

CV(32)  Illegal  element  symbol  was  found  among  elements  not 
included  in  Connection  Table. 

CV(33)  Total  element  count  for  non  C,  H,  N,  or  0  not  equal  to 
that  in  Connection  Table. 

CV(4l)  Illegal  valence  found  in  the  abnormality  table. 

CV(42)  Total  atom  count  for  C,  H,  N,  or  C  in  addend  molecular 

formula  not  equal  to  that  in  the  Hill  molecular  formula. 

CV(43)  Count  for  non  C,  H,  N,  or  0  elements  in  addend  molecular 
formula  is  not  equal  to  that  in  the  Hill  molecular 
formula. 

CV(44)  Multiplier  for  Hill  parent  of  a  hydrate  is  a  fraction  and 
cannot  at  present  be  handled  by  verification. 

CV(45)  Total  plus  charges  in  the  molecule  do  not  equal  the  total 
minus  charges. 
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APPENDIX  F 


MERGEWTHALER  CHEMICAL  TYPEWRITERS  CODES 


OCT. 

CODE 

1 

K&sHi 

LOWER 

CASE 

UPPER 

CASE 

OCT. 

CODE 

m 

LOWER 

CASE 

UPPER 

CASE 

245 

" 

* 

1 

1 

120 

L 

P 

P 

254 

9 

— 

321 

□ 

q 

Q 

55 

\ 

322 

a 

r 

R 

56 

A 

• 

3Et 

123 

o 

s 

S 

60 

) 

0 

o 

324 

. 

i 

t 

T 

261 

r 

! 

125 

\ 

u 

U 

262 

e 

2 

126 

V 

■■ 

63 

* 

• 

3 

327 

• 

w 

264 

% 

4 

330 

« 

X 

X 

65 

+ 

5 

5 

131 

- 

y 

66 

6 

e 

132 

z 

267 

7 

T 

135 

/ 

270 

# 

8 

8 

71 

( 

9 

273 

i 

** 

i 

77 

• 

OCT. 

CODE 

CONTROL  CHARACTERS 

10 1 

a 

a 

A 

000 

Null  code 

102 

P 

b 

B 

377 

Code  delete 

303 

c 

C 

201 

Power 

on 

104 

« 

d 

D 

210 

Backspace 

305 

g 

e 

11 

Tab 

306 

J 

f 

12 

Line  feed 

107 

7 

£ 

215 

Carriage  return 

110 

1 

DP 

H 

216 

Upper 

case 

311 

/ 

I 

17 

Sub  case 

312 

4 

» 

J 

24 

Stop  code 

113 

* 

lc 

K 

30 

Coord. 

follow 

314 

1 

L 

232 

'Lower 

case 

115 

m 

M 

234 

Ribbon 

shift  -  White 

116 

n 

N 

35 

Ribbon 

shift  -  Black 

Hi 

CD 

o 

0 

240 

Space 

DURA  MACH  CHEMICAL  TYPEWRITER  CODES 


CHEMICAL  LINE  PRINTER  CODES 


OCTAL 

CODE 

CHARACTER 

(Upper 

Case) 

CHARACTER 

(Lower 

Case) 

OCTAL 

CODE 

CHARACTER 

(Upper 

Case) 

n - - - 

CHARACTER 

(Lower 

Case) 

00 

J 

{ 

40 

SPACE 

SPACE 

01 

A 

a 

41 

1 

6 

02 

B 

b 

42 

It 

e 

03 

C 

C 

43 

# 

X 

04 

D 

d 

44 

$ 

0 

05 

E 

e 

45 

• 

7. 

06 

F 

f 

46 

& 

£ 

07 

G 

g 

47 

% 

1 

10 

H 

h 

50 

( 

11 

I 

i 

51 

s 

) 

12 

J 

j 

52 

* 

TT 

13 

K 

k 

53 

+ 

p 

14 

L 

1 

54 

II 

15 

M 

m 

55 

_ 

a 

16 

N 

n 

56 

• 

17 

0 

o 

57 

/ 

(U 

20 

P 

p 

60 

0 

0 

21 

Q 

q 

61 

1 

1 

22 

R 

r 

62 

2 

2 

23 

S 

s 

63 

3 

3 

24 

T 

t 

64 

4 

*-+ 

25 

U 

u 

65 

6 

5 

26 

V 

V 

66 

6 

6 

27 

w 

w 

67 

7 

7 

30 

X 

X 

70 

0 

8 

31 

Y 

y 

71 

9 

9 

32 

Z 

Z 

72 

; 

33 

A 

[ 

73 

CO 

» 

34 

V. 

or 

74 

1* 

< 

35 

1 

] 

75 

- 

0 

36 

III 

\ 

L 

} 

76 

® 

> 

37 

Y 

77 

2 

e 

\ 
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CONTROL  CODES 


OCTAL 

DEFINITION 

15 

Shift  to  upper 

17 

File  mark 

35 

Shift  to  upper 

36 

Function  code: 

55 

Shift  to  lower 

72 

End  of  line 

75 

Shift  to  lower 

case 

case  for  one  character 
means  «  Control  Code  follows 
case 

case  for  one  character 


Notes: 

1,  A  function  code  must  precede  every  control  code. 

2.  The  function  code  defines  the  next  character  as  a  control  code 
except  where  the  function  code  is  followed  by  an  end  of  line,  then  the 
second  character  is  also  a  control  code. 

3636  means  print  |||  or  }  depending  on  shift. 


LINE  SPACE  CONTROLS 


20 

Space 

0 

lines  at  six 

lines 

per 

inch 

21 

Space 

1 

line  at  six  lines  per 

inch 

22 

Space 

2 

lines  at  six 

lines 

per 

inch 

23 

Space 

3 

lines  at  six 

lines 

per 

inch 

24 

Space 

4 

lines  at  six 

lines 

per 

inch 

25 

Space 

5 

lines  at  six 

lines 

per 

inch 

26 

Space 

6 

lines  at  six 

lines 

per 

inch 

27 

Space 

7 

lines  at  six 

lines 

per 

inch 

40 

Skip 

to 

channel  0  (6 

lines 

per 

inch) 

41 

Skip 

to 

channel  1  (6 

lines 

per 

inch) 

42 

Skip 

to 

channel  2  (6 

lines 

per 

inch) 

43 

Skip 

to 

chaune 13(6 

lines 

per 

inch) 

44 

Skip  to 

channel  4  (6 

lines 

per 

inch) 

45 

Skip  to 

channel  5  (6 

lines 

per 

inch) 

46 

Skip  to 

channel  6  (6 

lines 

per 

inch) 

47 

Skip 

to 

channel  7  (6 

lines 

per 

inch) 

60 

Space 

0 

lines  at  twelve  lines 

per  inch 

61 

Space 

1 

line  at  twelve  lines  per  inch 

62 

Space 

2 

lines  at  twelve  lines 

per  inch 

63 

Space 

3 

lines  at  twelve,  lines 

per  inch 

64 

Space 

4 

lines  at  twelve  lines 

per  inch 

65 

Space 

5 

lines  at  twelve  lines 

per  inch 

66 

Space 

6 

lines  at  twelve  lines 

per  inch 

67 

Space 

7 

lines  at  twelve  lines 

per  inch 
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